3077 lines
86 KiB
Markdown
3077 lines
86 KiB
Markdown
# Wayland → WebRTC Remote Desktop Backend
|
|
## Technical Design Document
|
|
|
|
## Table of Contents
|
|
1. [System Architecture](#system-architecture)
|
|
2. [Technology Stack](#technology-stack)
|
|
3. [Key Components Design](#key-components-design)
|
|
4. [Data Flow Optimization](#data-flow-optimization)
|
|
5. [Low Latency Optimization](#low-latency-optimization)
|
|
6. [Implementation Roadmap](#implementation-roadmap)
|
|
7. [Potential Challenges & Solutions](#potential-challenges--solutions)
|
|
|
|
---
|
|
|
|
## System Architecture
|
|
|
|
### High-Level Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────┐
|
|
│ Client Browser │
|
|
│ (WebRTC Receiver) │
|
|
└─────────────────────────────┬───────────────────────────────────────┘
|
|
│ WebRTC (UDP/TCP)
|
|
│ Signaling (WebSocket/HTTP)
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────────┐
|
|
│ Signaling Server │
|
|
│ (WebSocket/WebSocket Secure) │
|
|
│ - Session Management │
|
|
│ - SDP Exchange │
|
|
│ - ICE Candidates │
|
|
└─────────────────────────────┬───────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────────┐
|
|
│ Rust Backend Server │
|
|
├─────────────────────────────────────────────────────────────────────┤
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
|
│ │ Capture │ │ Encoder │ │ WebRTC │ │
|
|
│ │ Manager │───▶│ Pipeline │───▶│ Transport │ │
|
|
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
|
│ │ │ │ │
|
|
│ ▼ ▼ ▼ │
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
|
│ │ PipeWire │ │ Video │ │ Data │ │
|
|
│ │ Portal │ │ Encoder │ │ Channels │ │
|
|
│ │ (xdg- │ │ (H.264/ │ │ (Input/ │ │
|
|
│ │ desktop- │ │ H.265/VP9) │ │ Control) │ │
|
|
│ │ portal) │ └──────────────┘ └──────────────┘ │
|
|
│ └──────────────┘ │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────────┐ │
|
|
│ │ Zero-Copy Buffer Manager │ │
|
|
│ │ - DMA-BUF Import/Export │ │
|
|
│ │ - Shared Memory Pools │ │
|
|
│ │ - Memory Ownership Tracking │ │
|
|
│ └─────────────────────────────────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────────┐
|
|
│ Wayland Compositor │
|
|
│ (PipeWire Screen Sharing) │
|
|
└─────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Component Breakdown
|
|
|
|
#### 1. Capture Manager
|
|
**Responsibilities:**
|
|
- Interface with PipeWire xdg-desktop-portal
|
|
- Request screen capture permissions
|
|
- Receive DMA-BUF frames
|
|
- Manage frame buffer lifecycle
|
|
|
|
**Key Technologies:**
|
|
- `pipewire` crate for PipeWire protocol
|
|
- `wayland-client` for Wayland protocol
|
|
- `ashpd` for desktop portals
|
|
|
|
```rust
|
|
pub struct CaptureManager {
|
|
pipewire_connection: Rc<PipewireConnection>,
|
|
stream_handle: Option<StreamHandle>,
|
|
frame_sender: async_channel::Sender<CapturedFrame>,
|
|
config: CaptureConfig,
|
|
}
|
|
|
|
pub struct CaptureConfig {
|
|
pub frame_rate: u32,
|
|
pub quality: QualityLevel,
|
|
pub screen_region: Option<ScreenRegion>,
|
|
}
|
|
|
|
pub enum QualityLevel {
|
|
Low,
|
|
Medium,
|
|
High,
|
|
Ultra,
|
|
}
|
|
|
|
pub struct CapturedFrame {
|
|
pub dma_buf: DmaBufHandle,
|
|
pub width: u32,
|
|
pub height: u32,
|
|
pub format: PixelFormat,
|
|
pub timestamp: u64,
|
|
}
|
|
```
|
|
|
|
#### 2. Encoder Pipeline
|
|
**Responsibilities:**
|
|
- Receive raw frames from capture
|
|
- Encode to H.264/H.265/VP9
|
|
- Hardware acceleration (VA-API, NVENC, VideoToolbox)
|
|
- Bitrate adaptation
|
|
|
|
**Zero-Copy Strategy:**
|
|
- Direct DMA-BUF to encoder (no CPU copies)
|
|
- Encoder outputs to memory-mapped buffers
|
|
- WebRTC consumes encoded buffers directly
|
|
|
|
```rust
|
|
pub struct EncoderPipeline {
|
|
encoder: Box<dyn VideoEncoder>,
|
|
config: EncoderConfig,
|
|
stats: EncoderStats,
|
|
}
|
|
|
|
pub trait VideoEncoder: Send + Sync {
|
|
fn encode_frame(
|
|
&mut self,
|
|
frame: CapturedFrame,
|
|
) -> Result<EncodedFrame, EncoderError>;
|
|
|
|
fn set_bitrate(&mut self, bitrate: u32) -> Result<(), EncoderError>;
|
|
|
|
fn request_keyframe(&mut self) -> Result<(), EncoderError>;
|
|
}
|
|
|
|
pub struct EncodedFrame {
|
|
pub data: Bytes, // Zero-copy Bytes wrapper
|
|
pub is_keyframe: bool,
|
|
pub timestamp: u64,
|
|
pub sequence_number: u64,
|
|
}
|
|
```
|
|
|
|
#### 3. WebRTC Transport
|
|
**Responsibilities:**
|
|
- WebRTC peer connection management
|
|
- Media track (video) and data channels
|
|
- RTP packetization
|
|
- ICE/STUN/TURN handling
|
|
- Congestion control
|
|
|
|
**Libraries:**
|
|
- `webrtc` crate (webrtc-rs) or custom WebRTC implementation
|
|
|
|
```rust
|
|
pub struct WebRtcTransport {
|
|
peer_connection: RTCPeerConnection,
|
|
video_track: RTCVideoTrack,
|
|
data_channel: Option<RTCDataChannel>,
|
|
config: WebRtcConfig,
|
|
}
|
|
|
|
pub struct WebRtcConfig {
|
|
pub stun_servers: Vec<String>,
|
|
pub turn_servers: Vec<TurnServer>,
|
|
pub ice_transport_policy: IceTransportPolicy,
|
|
}
|
|
|
|
pub struct TurnServer {
|
|
pub urls: Vec<String>,
|
|
pub username: String,
|
|
pub credential: String,
|
|
}
|
|
```
|
|
|
|
#### 4. Zero-Copy Buffer Manager
|
|
**Responsibilities:**
|
|
- Manage DMA-BUF lifecycle
|
|
- Pool pre-allocated memory
|
|
- Track ownership via Rust types
|
|
- Coordinate with PipeWire memory pools
|
|
|
|
```rust
|
|
pub struct BufferManager {
|
|
dma_buf_pool: Pool<DmaBufHandle>,
|
|
encoded_buffer_pool: Pool<Bytes>,
|
|
max_buffers: usize,
|
|
}
|
|
|
|
impl BufferManager {
|
|
pub fn acquire_dma_buf(&self) -> Option<DmaBufHandle> {
|
|
self.dma_buf_pool.acquire()
|
|
}
|
|
|
|
pub fn release_dma_buf(&self, handle: DmaBufHandle) {
|
|
self.dma_buf_pool.release(handle)
|
|
}
|
|
|
|
pub fn acquire_encoded_buffer(&self, size: usize) -> Option<Bytes> {
|
|
self.encoded_buffer_pool.acquire_with_size(size)
|
|
}
|
|
}
|
|
```
|
|
|
|
### Data Flow
|
|
|
|
```
|
|
Wayland Compositor
|
|
│
|
|
│ DMA-BUF (GPU memory)
|
|
▼
|
|
PipeWire Portal
|
|
│
|
|
│ DMA-BUF file descriptor
|
|
▼
|
|
Capture Manager
|
|
│
|
|
│ CapturedFrame { dma_buf, ... }
|
|
│ (Zero-copy ownership transfer)
|
|
▼
|
|
Buffer Manager
|
|
│
|
|
│ DmaBufHandle (moved, not copied)
|
|
▼
|
|
Encoder Pipeline
|
|
│
|
|
│ EncodedFrame { data: Bytes, ... }
|
|
│ (Zero-copy Bytes wrapper)
|
|
▼
|
|
WebRTC Transport
|
|
│
|
|
│ RTP Packets (reference to Bytes)
|
|
▼
|
|
Network (UDP/TCP)
|
|
│
|
|
▼
|
|
Client Browser
|
|
```
|
|
|
|
---
|
|
|
|
## Technology Stack
|
|
|
|
### Core Dependencies
|
|
|
|
```toml
|
|
[dependencies]
|
|
# Async Runtime
|
|
tokio = { version = "1.35", features = ["full", "rt-multi-thread"] }
|
|
async-trait = "0.1"
|
|
|
|
# Wayland & PipeWire
|
|
wayland-client = "0.31"
|
|
wayland-protocols = "0.31"
|
|
pipewire = "0.8"
|
|
ashpd = "0.8"
|
|
|
|
# Video Encoding (Low Latency)
|
|
openh264 = { version = "0.6", optional = true }
|
|
x264 = { version = "0.4", optional = true }
|
|
nvenc = { version = "0.1", optional = true }
|
|
vpx = { version = "0.1", optional = true }
|
|
|
|
# Hardware Acceleration (Low Latency)
|
|
libva = { version = "0.14", optional = true } # VA-API
|
|
nvidia-encode = { version = "0.5", optional = true } # NVENC
|
|
|
|
# WebRTC (Low Latency Configuration)
|
|
webrtc = "0.11" # webrtc-rs
|
|
|
|
# Memory & Zero-Copy
|
|
bytes = "1.5"
|
|
memmap2 = "0.9"
|
|
shared_memory = "0.12"
|
|
|
|
# Lock-free data structures for minimal contention
|
|
crossbeam = { version = "0.8", features = ["std"] }
|
|
crossbeam-channel = "0.5"
|
|
crossbeam-queue = "0.3"
|
|
parking_lot = "0.12" # Faster mutexes
|
|
|
|
# Serialization
|
|
serde = { version = "1.0", features = ["derive"] }
|
|
serde_json = "1.0"
|
|
|
|
# Logging & Tracing
|
|
tracing = "0.1"
|
|
tracing-subscriber = "0.3"
|
|
tracing-opentelemetry = "0.22" # For latency monitoring
|
|
|
|
# Metrics & Monitoring
|
|
prometheus = { version = "0.13", optional = true }
|
|
metrics = "0.21"
|
|
|
|
# Error Handling
|
|
anyhow = "1.0"
|
|
thiserror = "1.0"
|
|
|
|
# Utilities
|
|
regex = "1.10"
|
|
uuid = { version = "1.6", features = ["v4", "serde", "fast-rng"] }
|
|
instant = "0.1" # High-precision timing
|
|
|
|
[features]
|
|
default = ["software-encoder", "webrtc-rs"]
|
|
|
|
# Encoder Options
|
|
software-encoder = ["x264", "openh264"]
|
|
hardware-vaapi = ["libva"]
|
|
hardware-nvenc = ["nvidia-encode"]
|
|
all-encoders = ["software-encoder", "hardware-vaapi", "hardware-nvenc"]
|
|
|
|
# WebRTC Implementation
|
|
webrtc-rs = ["webrtc"]
|
|
custom-webrtc = []
|
|
|
|
# Low Latency Features
|
|
low-latency = []
|
|
ultra-low-latency = ["low-latency", "all-encoders"]
|
|
|
|
# Monitoring
|
|
monitoring = ["prometheus", "tracing-opentelemetry"]
|
|
|
|
# Development
|
|
dev = ["monitoring", "all-encoders"]
|
|
```
|
|
|
|
### Encoder Options
|
|
|
|
| Encoder | Hardware | Performance | Quality | License | Use Case |
|
|
|---------|----------|-------------|---------|---------|----------|
|
|
| H.264 (x264) | CPU | Medium | High | GPL | Fallback |
|
|
| H.264 (VA-API) | GPU | High | Medium | Open Source | Linux Intel/AMD |
|
|
| H.264 (NVENC) | GPU (NVIDIA) | Very High | High | Proprietary | NVIDIA GPUs |
|
|
| H.265 (HEVC) | GPU | High | Very High | Mixed | Bandwidth-constrained |
|
|
| VP9 | CPU/GPU | Medium | High | BSD | Open Web |
|
|
| AV1 | GPU | Medium | Very High | Open Source | Future-proof |
|
|
|
|
**Recommended Primary:** VA-API H.264 (Linux), NVENC H.264 (NVIDIA)
|
|
**Recommended Fallback:** x264 H.264 (software)
|
|
|
|
### WebRTC Libraries
|
|
|
|
**Option 1: webrtc-rs** (Recommended)
|
|
- Pure Rust implementation
|
|
- Active development
|
|
- Good WebRTC spec compliance
|
|
- Zero-copy support for media
|
|
|
|
**Option 2: Custom Implementation**
|
|
- Use `webrtc` crate as base
|
|
- Add specialized zero-copy optimizations
|
|
- Tighter integration with encoder pipeline
|
|
|
|
---
|
|
|
|
## Key Components Design
|
|
|
|
### 1. Wayland Screen Capture Module
|
|
|
|
```rust
|
|
// src/capture/mod.rs
|
|
use pipewire as pw;
|
|
use pipewire::properties;
|
|
use pipewire::spa::param::format::Format;
|
|
use pipewire::stream::StreamFlags;
|
|
use async_channel::{Sender, Receiver};
|
|
|
|
pub struct WaylandCapture {
|
|
core: pw::Core,
|
|
context: pw::Context,
|
|
main_loop: pw::MainLoop,
|
|
stream: pw::stream::Stream,
|
|
frame_sender: Sender<CapturedFrame>,
|
|
frame_receiver: Receiver<CapturedFrame>,
|
|
}
|
|
|
|
impl WaylandCapture {
|
|
pub async fn new(config: CaptureConfig) -> Result<Self, CaptureError> {
|
|
let main_loop = pw::MainLoop::new()?;
|
|
let context = pw::Context::new(&main_loop)?;
|
|
let core = context.connect(None)?;
|
|
|
|
// Request screen capture via xdg-desktop-portal
|
|
let portal = Portal::new().await?;
|
|
let session = portal.create_session(ScreenCaptureType::Monitor).await?;
|
|
let sources = portal.request_sources(&session).await?;
|
|
|
|
let (sender, receiver) = async_channel::bounded(30);
|
|
|
|
Ok(Self {
|
|
core,
|
|
context,
|
|
main_loop,
|
|
stream: Self::create_stream(&context, &session, sender.clone())?,
|
|
frame_sender: sender,
|
|
frame_receiver: receiver,
|
|
})
|
|
}
|
|
|
|
fn create_stream(
|
|
context: &pw::Context,
|
|
session: &Session,
|
|
sender: Sender<CapturedFrame>,
|
|
) -> Result<pw::stream::Stream, pw::Error> {
|
|
let mut stream = pw::stream::Stream::new(
|
|
context,
|
|
"wl-webrtc-capture",
|
|
properties! {
|
|
*pw::keys::MEDIA_TYPE => "Video",
|
|
*pw::keys::MEDIA_CATEGORY => "Capture",
|
|
*pw::keys::MEDIA_ROLE => "Screen",
|
|
},
|
|
)?;
|
|
|
|
stream.connect(
|
|
pw::spa::direction::Direction::Input,
|
|
None,
|
|
StreamFlags::AUTOCONNECT | StreamFlags::MAP_BUFFERS,
|
|
)?;
|
|
|
|
// Set up callback for new frames (zero-copy DMA-BUF)
|
|
let listener = stream.add_local_listener()?;
|
|
listener
|
|
.register(pw::stream::events::Events::param_done, |data| {
|
|
// Handle stream parameter changes
|
|
})
|
|
.register(pw::stream::events::Events::process, |data| {
|
|
// Process new frame - DMA-BUF is already mapped
|
|
Self::process_frame(data, sender.clone());
|
|
})?;
|
|
|
|
Ok(stream)
|
|
}
|
|
|
|
fn process_frame(
|
|
stream: &pw::stream::Stream,
|
|
sender: Sender<CapturedFrame>,
|
|
) {
|
|
// Get buffer without copying - DMA-BUF is in GPU memory
|
|
let buffer = stream.dequeue_buffer().expect("no buffer");
|
|
let datas = buffer.datas();
|
|
let data = &datas[0];
|
|
|
|
// Create zero-copy frame
|
|
let frame = CapturedFrame {
|
|
dma_buf: DmaBufHandle::from_buffer(buffer),
|
|
width: stream.format().unwrap().size().width,
|
|
height: stream.format().unwrap().size().height,
|
|
format: PixelFormat::from_spa_format(&stream.format().unwrap()),
|
|
timestamp: timestamp_ns(),
|
|
};
|
|
|
|
// Send frame (ownership transferred via move)
|
|
let _ = sender.try_send(frame);
|
|
}
|
|
|
|
pub async fn next_frame(&self) -> CapturedFrame {
|
|
self.frame_receiver.recv().await.unwrap()
|
|
}
|
|
}
|
|
|
|
// Zero-copy DMA-BUF handle
|
|
pub struct DmaBufHandle {
|
|
fd: RawFd,
|
|
size: usize,
|
|
stride: u32,
|
|
offset: u32,
|
|
}
|
|
|
|
impl DmaBufHandle {
|
|
pub fn from_buffer(buffer: &pw::buffer::Buffer) -> Self {
|
|
let data = &buffer.datas()[0];
|
|
Self {
|
|
fd: data.fd().unwrap(),
|
|
size: data.chunk().size() as usize,
|
|
stride: data.chunk().stride(),
|
|
offset: data.chunk().offset(),
|
|
}
|
|
}
|
|
|
|
pub unsafe fn as_ptr(&self) -> *mut u8 {
|
|
// Memory map the DMA-BUF
|
|
let ptr = libc::mmap(
|
|
ptr::null_mut(),
|
|
self.size,
|
|
libc::PROT_READ,
|
|
libc::MAP_SHARED,
|
|
self.fd,
|
|
self.offset as i64,
|
|
);
|
|
|
|
if ptr == libc::MAP_FAILED {
|
|
panic!("Failed to mmap DMA-BUF");
|
|
}
|
|
|
|
ptr as *mut u8
|
|
}
|
|
}
|
|
|
|
impl Drop for DmaBufHandle {
|
|
fn drop(&mut self) {
|
|
// Unmap and close FD when handle is dropped
|
|
unsafe {
|
|
libc::munmap(ptr::null_mut(), self.size);
|
|
libc::close(self.fd);
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### 2. Frame Buffer Management (Zero-Copy)
|
|
|
|
```rust
|
|
// src/buffer/mod.rs
|
|
use bytes::Bytes;
|
|
use std::sync::Arc;
|
|
use std::collections::VecDeque;
|
|
|
|
pub struct FrameBufferPool {
|
|
dma_bufs: VecDeque<DmaBufHandle>,
|
|
encoded_buffers: VecDeque<Bytes>,
|
|
max_dma_bufs: usize,
|
|
max_encoded: usize,
|
|
}
|
|
|
|
impl FrameBufferPool {
|
|
pub fn new(max_dma_bufs: usize, max_encoded: usize) -> Self {
|
|
Self {
|
|
dma_bufs: VecDeque::with_capacity(max_dma_bufs),
|
|
encoded_buffers: VecDeque::with_capacity(max_encoded),
|
|
max_dma_bufs,
|
|
max_encoded,
|
|
}
|
|
}
|
|
|
|
pub fn acquire_dma_buf(&mut self) -> Option<DmaBufHandle> {
|
|
self.dma_bufs.pop_front()
|
|
}
|
|
|
|
pub fn release_dma_buf(&mut self, buf: DmaBufHandle) {
|
|
if self.dma_bufs.len() < self.max_dma_bufs {
|
|
self.dma_bufs.push_back(buf);
|
|
}
|
|
// Else: Drop the buffer, let OS reclaim DMA-BUF
|
|
}
|
|
|
|
pub fn acquire_encoded_buffer(&mut self, size: usize) -> Bytes {
|
|
// Try to reuse existing buffer
|
|
if let Some(mut buf) = self.encoded_buffers.pop_front() {
|
|
if buf.len() >= size {
|
|
// Slice to requested size (zero-copy view)
|
|
return buf.split_to(size);
|
|
}
|
|
}
|
|
|
|
// Allocate new buffer if needed
|
|
Bytes::from(vec![0u8; size])
|
|
}
|
|
|
|
pub fn release_encoded_buffer(&mut self, buf: Bytes) {
|
|
if self.encoded_buffers.len() < self.max_encoded {
|
|
self.encoded_buffers.push_back(buf);
|
|
}
|
|
// Else: Drop the buffer, memory freed
|
|
}
|
|
}
|
|
|
|
// Zero-copy frame wrapper
|
|
pub struct ZeroCopyFrame {
|
|
pub data: Bytes, // Reference-counted, no copying
|
|
pub metadata: FrameMetadata,
|
|
}
|
|
|
|
pub struct FrameMetadata {
|
|
pub width: u32,
|
|
pub height: u32,
|
|
pub format: PixelFormat,
|
|
pub timestamp: u64,
|
|
pub is_keyframe: bool,
|
|
}
|
|
|
|
// Smart pointer for DMA-BUF
|
|
pub struct DmaBufPtr {
|
|
ptr: *mut u8,
|
|
len: usize,
|
|
_marker: PhantomData<&'static mut [u8]>,
|
|
}
|
|
|
|
impl DmaBufPtr {
|
|
pub unsafe fn new(ptr: *mut u8, len: usize) -> Self {
|
|
Self {
|
|
ptr,
|
|
len,
|
|
_marker: PhantomData,
|
|
}
|
|
}
|
|
|
|
pub fn as_slice(&self) -> &[u8] {
|
|
unsafe { std::slice::from_raw_parts(self.ptr, self.len) }
|
|
}
|
|
}
|
|
|
|
unsafe impl Send for DmaBufPtr {}
|
|
unsafe impl Sync for DmaBufPtr {}
|
|
|
|
impl Drop for DmaBufPtr {
|
|
fn drop(&mut self) {
|
|
// Memory will be unmapped by DmaBufHandle's Drop
|
|
}
|
|
}
|
|
```
|
|
|
|
### 3. Video Encoder Integration
|
|
|
|
```rust
|
|
// src/encoder/mod.rs
|
|
use async_trait::async_trait;
|
|
|
|
pub enum EncoderType {
|
|
H264_VAAPI,
|
|
H264_NVENC,
|
|
H264_X264,
|
|
VP9_VAAPI,
|
|
}
|
|
|
|
pub struct EncoderConfig {
|
|
pub encoder_type: EncoderType,
|
|
pub bitrate: u32,
|
|
pub keyframe_interval: u32,
|
|
pub preset: EncodePreset,
|
|
}
|
|
|
|
pub enum EncodePreset {
|
|
Ultrafast,
|
|
Superfast,
|
|
Veryfast,
|
|
Faster,
|
|
Fast,
|
|
Medium,
|
|
Slow,
|
|
Slower,
|
|
Veryslow,
|
|
}
|
|
|
|
#[async_trait]
|
|
pub trait VideoEncoder: Send + Sync {
|
|
async fn encode(&mut self, frame: CapturedFrame) -> Result<EncodedFrame, EncoderError>;
|
|
async fn reconfigure(&mut self, config: EncoderConfig) -> Result<(), EncoderError>;
|
|
async fn request_keyframe(&mut self) -> Result<(), EncoderError>;
|
|
}
|
|
|
|
pub struct VaapiEncoder {
|
|
display: va::Display,
|
|
context: va::Context,
|
|
config: EncoderConfig,
|
|
sequence_number: u64,
|
|
}
|
|
|
|
impl VaapiEncoder {
|
|
pub fn new(config: EncoderConfig) -> Result<Self, EncoderError> {
|
|
let display = va::Display::open(None)?;
|
|
let context = va::Context::new(&display)?;
|
|
|
|
Ok(Self {
|
|
display,
|
|
context,
|
|
config,
|
|
sequence_number: 0,
|
|
})
|
|
}
|
|
}
|
|
|
|
#[async_trait]
|
|
impl VideoEncoder for VaapiEncoder {
|
|
async fn encode(&mut self, frame: CapturedFrame) -> Result<EncodedFrame, EncoderError> {
|
|
// Zero-copy: Import DMA-BUF directly into VA-API surface
|
|
let surface = unsafe {
|
|
self.context.import_dma_buf(
|
|
frame.dma_buf.fd,
|
|
frame.width,
|
|
frame.height,
|
|
frame.format.as_va_format(),
|
|
)?
|
|
};
|
|
|
|
// Encode frame (hardware accelerated)
|
|
let encoded_data = self.context.encode_surface(surface)?;
|
|
|
|
// Create zero-copy Bytes wrapper
|
|
let bytes = Bytes::from(encoded_data);
|
|
|
|
self.sequence_number += 1;
|
|
|
|
Ok(EncodedFrame {
|
|
data: bytes,
|
|
is_keyframe: surface.is_keyframe(),
|
|
timestamp: frame.timestamp,
|
|
sequence_number: self.sequence_number,
|
|
})
|
|
}
|
|
|
|
async fn reconfigure(&mut self, config: EncoderConfig) -> Result<(), EncoderError> {
|
|
self.config = config;
|
|
self.context.set_bitrate(config.bitrate)?;
|
|
self.context.set_preset(config.preset)?;
|
|
Ok(())
|
|
}
|
|
|
|
async fn request_keyframe(&mut self) -> Result<(), EncoderError> {
|
|
self.context.force_keyframe()?;
|
|
Ok(())
|
|
}
|
|
}
|
|
|
|
// Fallback software encoder
|
|
pub struct X264Encoder {
|
|
encoder: x264::Encoder,
|
|
config: EncoderConfig,
|
|
sequence_number: u64,
|
|
}
|
|
|
|
impl X264Encoder {
|
|
pub fn new(config: EncoderConfig) -> Result<Self, EncoderError> {
|
|
let params = x264::Params::default();
|
|
params.set_width(1920);
|
|
params.set_height(1080);
|
|
params.set_fps(60, 1);
|
|
params.set_bitrate(config.bitrate);
|
|
params.set_preset(config.preset);
|
|
params.set_tune("zerolatency");
|
|
|
|
let encoder = x264::Encoder::open(¶ms)?;
|
|
|
|
Ok(Self {
|
|
encoder,
|
|
config,
|
|
sequence_number: 0,
|
|
})
|
|
}
|
|
}
|
|
|
|
#[async_trait]
|
|
impl VideoEncoder for X264Encoder {
|
|
async fn encode(&mut self, frame: CapturedFrame) -> Result<EncodedFrame, EncoderError> {
|
|
// Map DMA-BUF to CPU memory (one-time copy)
|
|
let ptr = unsafe { frame.dma_buf.as_ptr() };
|
|
let slice = unsafe { std::slice::from_raw_parts(ptr, frame.dma_buf.size) };
|
|
|
|
// Convert to YUV if needed
|
|
let yuv_frame = self.convert_to_yuv(slice, frame.width, frame.height)?;
|
|
|
|
// Encode frame
|
|
let encoded_data = self.encoder.encode(&yuv_frame)?;
|
|
|
|
self.sequence_number += 1;
|
|
|
|
Ok(EncodedFrame {
|
|
data: Bytes::from(encoded_data),
|
|
is_keyframe: self.encoder.is_keyframe(),
|
|
timestamp: frame.timestamp,
|
|
sequence_number: self.sequence_number,
|
|
})
|
|
}
|
|
|
|
async fn reconfigure(&mut self, config: EncoderConfig) -> Result<(), EncoderError> {
|
|
self.config = config;
|
|
// Reopen encoder with new params
|
|
Ok(())
|
|
}
|
|
|
|
async fn request_keyframe(&mut self) -> Result<(), EncoderError> {
|
|
self.encoder.force_keyframe();
|
|
Ok(())
|
|
}
|
|
}
|
|
```
|
|
|
|
### 4. WebRTC Signaling and Data Transport
|
|
|
|
```rust
|
|
// src/webrtc/mod.rs
|
|
use webrtc::{
|
|
api::APIBuilder,
|
|
ice_transport::ice_server::RTCIceServer,
|
|
media_track::{track_local::track_local_static_sample::TrackLocalStaticSample, TrackLocal},
|
|
peer_connection::{
|
|
configuration::RTCConfiguration,
|
|
peer_connection_state::RTCPeerConnectionState,
|
|
sdp::session_description::RTCSessionDescription,
|
|
RTCPeerConnection,
|
|
},
|
|
rtp_transceiver::rtp_codec::RTCRtpCodecCapability,
|
|
};
|
|
|
|
pub struct WebRtcServer {
|
|
api: webrtc::API,
|
|
peer_connections: Arc<Mutex<HashMap<String, PeerConnection>>>,
|
|
signaling_server: SignalingServer,
|
|
}
|
|
|
|
impl WebRtcServer {
|
|
pub async fn new(config: WebRtcConfig) -> Result<Self, WebRtcError> {
|
|
let mut api = APIBuilder::new().build();
|
|
|
|
let signaling_server = SignalingServer::new(config.signaling_addr).await?;
|
|
|
|
Ok(Self {
|
|
api,
|
|
peer_connections: Arc::new(Mutex::new(HashMap::new())),
|
|
signaling_server,
|
|
})
|
|
}
|
|
|
|
pub async fn create_peer_connection(
|
|
&self,
|
|
session_id: String,
|
|
video_track: TrackLocalStaticSample,
|
|
) -> Result<String, WebRtcError> {
|
|
let config = RTCConfiguration {
|
|
ice_servers: vec![RTCIceServer {
|
|
urls: self.signaling_server.stun_servers(),
|
|
..Default::default()
|
|
}],
|
|
..Default::default()
|
|
};
|
|
|
|
let pc = self.api.new_peer_connection(config).await?;
|
|
|
|
// Add video track
|
|
let rtp_transceiver = pc
|
|
.add_track(Arc::new(video_track))
|
|
.await?;
|
|
|
|
// Set ICE candidate handler
|
|
let peer_connections = self.peer_connections.clone();
|
|
pc.on_ice_candidate(Box::new(move |candidate| {
|
|
let peer_connections = peer_connections.clone();
|
|
Box::pin(async move {
|
|
if let Some(candidate) = candidate {
|
|
// Send candidate to signaling server
|
|
// ...
|
|
}
|
|
})
|
|
}))
|
|
.await;
|
|
|
|
// Store peer connection
|
|
self.peer_connections
|
|
.lock()
|
|
.await
|
|
.insert(session_id.clone(), PeerConnection::new(pc));
|
|
|
|
Ok(session_id)
|
|
}
|
|
|
|
pub async fn send_video_frame(
|
|
&self,
|
|
session_id: &str,
|
|
frame: EncodedFrame,
|
|
) -> Result<(), WebRtcError> {
|
|
let peer_connections = self.peer_connections.lock().await;
|
|
|
|
if let Some(peer) = peer_connections.get(session_id) {
|
|
peer.video_track.write_sample(&webrtc::media::Sample {
|
|
data: frame.data.to_vec(),
|
|
duration: std::time::Duration::from_nanos(frame.timestamp),
|
|
..Default::default()
|
|
}).await?;
|
|
}
|
|
|
|
Ok(())
|
|
}
|
|
}
|
|
|
|
pub struct PeerConnection {
|
|
pc: RTCPeerConnection,
|
|
video_track: Arc<TrackLocalStaticSample>,
|
|
data_channel: Option<Arc<RTCDataChannel>>,
|
|
}
|
|
|
|
impl PeerConnection {
|
|
pub async fn create_offer(&mut self) -> Result<RTCSessionDescription, WebRtcError> {
|
|
let offer = self.pc.create_offer(None).await?;
|
|
self.pc.set_local_description(offer.clone()).await?;
|
|
Ok(offer)
|
|
}
|
|
|
|
pub async fn set_remote_description(
|
|
&mut self,
|
|
desc: RTCSessionDescription,
|
|
) -> Result<(), WebRtcError> {
|
|
self.pc.set_remote_description(desc).await
|
|
}
|
|
|
|
pub async fn create_answer(&mut self) -> Result<RTCSessionDescription, WebRtcError> {
|
|
let answer = self.pc.create_answer(None).await?;
|
|
self.pc.set_local_description(answer.clone()).await?;
|
|
Ok(answer)
|
|
}
|
|
}
|
|
|
|
// Data channel for input/control
|
|
pub struct DataChannelManager {
|
|
channels: HashMap<String, Arc<RTCDataChannel>>,
|
|
}
|
|
|
|
impl DataChannelManager {
|
|
pub async fn send_input(&self, channel_id: &str, input: InputEvent) -> Result<()> {
|
|
if let Some(channel) = self.channels.get(channel_id) {
|
|
let data = serde_json::to_vec(&input)?;
|
|
channel.send(&data).await?;
|
|
}
|
|
Ok(())
|
|
}
|
|
|
|
pub fn on_input<F>(&mut self, channel_id: String, callback: F)
|
|
where
|
|
F: Fn(InputEvent) + Send + Sync + 'static,
|
|
{
|
|
if let Some(channel) = self.channels.get(&channel_id) {
|
|
channel.on_message(Box::new(move |msg| {
|
|
if let Ok(input) = serde_json::from_slice::<InputEvent>(&msg.data) {
|
|
callback(input);
|
|
}
|
|
})).unwrap();
|
|
}
|
|
}
|
|
}
|
|
|
|
#[derive(Debug, Serialize, Deserialize)]
|
|
pub enum InputEvent {
|
|
MouseMove { x: f32, y: f32 },
|
|
MouseClick { button: MouseButton },
|
|
KeyPress { key: String },
|
|
KeyRelease { key: String },
|
|
}
|
|
|
|
#[derive(Debug, Serialize, Deserialize)]
|
|
pub enum MouseButton {
|
|
Left,
|
|
Right,
|
|
Middle,
|
|
}
|
|
```
|
|
|
|
### 5. IPC Layer (Optional)
|
|
|
|
```rust
|
|
// src/ipc/mod.rs
|
|
use tokio::net::UnixListener;
|
|
use tokio::io::{AsyncReadExt, AsyncWriteExt};
|
|
|
|
pub struct IpcServer {
|
|
listener: UnixListener,
|
|
}
|
|
|
|
impl IpcServer {
|
|
pub async fn new(socket_path: &str) -> Result<Self> {
|
|
// Remove existing socket if present
|
|
let _ = std::fs::remove_file(socket_path);
|
|
|
|
let listener = UnixListener::bind(socket_path)?;
|
|
|
|
Ok(Self { listener })
|
|
}
|
|
|
|
pub async fn run(&self, sender: async_channel::Sender<IpcMessage>) {
|
|
loop {
|
|
match self.listener.accept().await {
|
|
Ok((mut stream, _)) => {
|
|
let sender = sender.clone();
|
|
tokio::spawn(async move {
|
|
let mut buf = [0; 1024];
|
|
loop {
|
|
match stream.read(&mut buf).await {
|
|
Ok(0) => break,
|
|
Ok(n) => {
|
|
if let Ok(msg) = serde_json::from_slice::<IpcMessage>(&buf[..n]) {
|
|
let _ = sender.send(msg).await;
|
|
}
|
|
}
|
|
Err(_) => break,
|
|
}
|
|
}
|
|
});
|
|
}
|
|
Err(_) => continue,
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
#[derive(Debug, Serialize, Deserialize)]
|
|
pub enum IpcMessage {
|
|
StartCapture { session_id: String },
|
|
StopCapture { session_id: String },
|
|
SetQuality { level: QualityLevel },
|
|
GetStatus,
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Data Flow Optimization
|
|
|
|
### Zero-Copy Pipeline Stages
|
|
|
|
```
|
|
Stage 1: Capture
|
|
Input: Wayland Compositor (GPU memory)
|
|
Output: DMA-BUF file descriptor
|
|
Copy: None (Zero-copy)
|
|
|
|
Stage 2: Buffer Manager
|
|
Input: DMA-BUF FD
|
|
Output: DmaBufHandle (RAII wrapper)
|
|
Copy: None (Zero-copy ownership transfer)
|
|
|
|
Stage 3: Encoder
|
|
Input: DmaBufHandle
|
|
Output: Bytes (reference-counted)
|
|
Copy: None (DMA-BUF imported directly to GPU encoder)
|
|
|
|
Stage 4: WebRTC
|
|
Input: Bytes
|
|
Output: RTP packets (references to Bytes)
|
|
Copy: None (Zero-copy to socket buffers)
|
|
|
|
Stage 5: Network
|
|
Input: RTP packets
|
|
Output: UDP datagrams
|
|
Copy: Minimal (kernel space only)
|
|
```
|
|
|
|
### Memory Ownership Transfer
|
|
|
|
```rust
|
|
// Example: Ownership transfer through pipeline
|
|
async fn process_frame_pipeline(
|
|
mut capture: WaylandCapture,
|
|
mut encoder: VaapiEncoder,
|
|
mut webrtc: WebRtcServer,
|
|
) -> Result<()> {
|
|
loop {
|
|
// Stage 1: Capture (ownership moves from PipeWire to our code)
|
|
let frame = capture.next_frame().await; // CapturedFrame owns DmaBufHandle
|
|
|
|
// Stage 2: Encode (ownership moved, not copied)
|
|
let encoded = encoder.encode(frame).await?; // EncodedFrame owns Bytes
|
|
|
|
// Stage 3: Send (Bytes is reference-counted, no copy)
|
|
webrtc.send_video_frame("session-123", encoded).await?;
|
|
|
|
// Ownership transferred all the way without copying
|
|
}
|
|
}
|
|
```
|
|
|
|
### Buffer Sharing Mechanisms
|
|
|
|
#### 1. DMA-BUF (Primary)
|
|
- GPU memory buffers
|
|
- Exported as file descriptors
|
|
- Zero-copy to hardware encoders
|
|
- Limited to same GPU/driver
|
|
|
|
```rust
|
|
pub fn export_dma_buf(surface: &va::Surface) -> Result<DmaBufHandle> {
|
|
let fd = surface.export_dma_buf()?;
|
|
Ok(DmaBufHandle {
|
|
fd,
|
|
size: surface.size(),
|
|
stride: surface.stride(),
|
|
offset: 0,
|
|
})
|
|
}
|
|
```
|
|
|
|
#### 2. Shared Memory (Fallback)
|
|
- POSIX shared memory (shm_open)
|
|
- For software encoding path
|
|
- Copy from DMA-BUF to shared memory
|
|
|
|
```rust
|
|
pub fn create_shared_buffer(size: usize) -> Result<SharedBuffer> {
|
|
let name = format!("/wl-webrtc-{}", uuid::Uuid::new_v4());
|
|
let fd = shm_open(&name, O_CREAT | O_RDWR, 0666)?;
|
|
ftruncate(fd, size as i64)?;
|
|
|
|
let ptr = unsafe { mmap(ptr::null_mut(), size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0) };
|
|
|
|
Ok(SharedBuffer {
|
|
ptr,
|
|
size,
|
|
fd,
|
|
name,
|
|
})
|
|
}
|
|
```
|
|
|
|
#### 3. Memory-Mapped Files (Alternative)
|
|
- For persistent caching
|
|
- Cross-process communication
|
|
- Used for frame buffering
|
|
|
|
```rust
|
|
pub struct MappedFile {
|
|
file: File,
|
|
ptr: *mut u8,
|
|
size: usize,
|
|
}
|
|
|
|
impl MappedFile {
|
|
pub fn new(path: &Path, size: usize) -> Result<Self> {
|
|
let file = OpenOptions::new()
|
|
.read(true)
|
|
.write(true)
|
|
.create(true)
|
|
.open(path)?;
|
|
|
|
file.set_len(size as u64)?;
|
|
|
|
let ptr = unsafe {
|
|
mmap(
|
|
ptr::null_mut(),
|
|
size,
|
|
PROT_READ | PROT_WRITE,
|
|
MAP_SHARED,
|
|
file.as_raw_fd(),
|
|
0,
|
|
)
|
|
}?;
|
|
|
|
Ok(Self { file, ptr, size })
|
|
}
|
|
}
|
|
```
|
|
|
|
### Pipeline Optimization Strategies
|
|
|
|
#### 1. Parallel Encoding
|
|
```rust
|
|
// Run multiple encoders in parallel for different quality levels
|
|
pub struct AdaptiveEncoder {
|
|
encoders: Vec<Box<dyn VideoEncoder>>,
|
|
active_encoder: usize,
|
|
bandwidth_monitor: BandwidthMonitor,
|
|
}
|
|
|
|
impl AdaptiveEncoder {
|
|
pub async fn encode_adaptive(&mut self, frame: CapturedFrame) -> Result<EncodedFrame> {
|
|
let bandwidth = self.bandwidth_monitor.current_bandwidth();
|
|
|
|
// Switch encoder based on bandwidth
|
|
let new_encoder = match bandwidth {
|
|
b if b < 500_000 => 0, // Low bitrate
|
|
b if b < 2_000_000 => 1, // Medium bitrate
|
|
_ => 2, // High bitrate
|
|
};
|
|
|
|
if new_encoder != self.active_encoder {
|
|
self.active_encoder = new_encoder;
|
|
}
|
|
|
|
self.encoders[self.active_encoder].encode(frame).await
|
|
}
|
|
}
|
|
```
|
|
|
|
#### 2. Frame Skipping
|
|
```rust
|
|
pub struct FrameSkipper {
|
|
target_fps: u32,
|
|
last_frame_time: Instant,
|
|
skip_count: u32,
|
|
}
|
|
|
|
impl FrameSkipper {
|
|
pub fn should_skip(&mut self) -> bool {
|
|
let now = Instant::now();
|
|
let elapsed = now.duration_since(self.last_frame_time).as_millis();
|
|
let frame_interval = 1000 / self.target_fps as u128;
|
|
|
|
if elapsed < frame_interval {
|
|
self.skip_count += 1;
|
|
return true;
|
|
}
|
|
|
|
self.last_frame_time = now;
|
|
self.skip_count = 0;
|
|
false
|
|
}
|
|
}
|
|
```
|
|
|
|
#### 3. Region of Interest (ROI)
|
|
```rust
|
|
pub struct RegionEncoder {
|
|
full_encoder: Box<dyn VideoEncoder>,
|
|
roi_encoder: Box<dyn VideoEncoder>,
|
|
current_region: Option<ScreenRegion>,
|
|
}
|
|
|
|
impl RegionEncoder {
|
|
pub async fn encode_roi(
|
|
&mut self,
|
|
frame: CapturedFrame,
|
|
roi: Option<ScreenRegion>,
|
|
) -> Result<EncodedFrame> {
|
|
if let Some(region) = roi {
|
|
// Encode only ROI with higher quality
|
|
let cropped = self.crop_frame(frame, region)?;
|
|
self.roi_encoder.encode(cropped).await
|
|
} else {
|
|
// Encode full frame
|
|
self.full_encoder.encode(frame).await
|
|
}
|
|
}
|
|
|
|
fn crop_frame(&self, mut frame: CapturedFrame, region: ScreenRegion) -> Result<CapturedFrame> {
|
|
// Adjust DMA-BUF offsets for region
|
|
frame.width = region.width;
|
|
frame.height = region.height;
|
|
Ok(frame)
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Low Latency Optimization
|
|
|
|
### Design Philosophy
|
|
|
|
To achieve 15-25ms latency on local networks, we prioritize:
|
|
1. **Speed over completeness**: Fast, low-latency delivery is more important than perfect reliability
|
|
2. **Minimize buffering**: Small buffers at every stage
|
|
3. **Zero-copy everywhere**: Eliminate CPU memory copies
|
|
4. **Hardware acceleration**: Use GPU for all intensive operations
|
|
5. **Predictive timing**: Reduce wait times with accurate timing
|
|
|
|
### 1. Encoder Optimization
|
|
|
|
#### Hardware Encoder Configuration
|
|
|
|
```rust
|
|
pub struct LowLatencyEncoderConfig {
|
|
// Codec settings
|
|
pub codec: VideoCodec,
|
|
|
|
// Low-latency specific
|
|
pub gop_size: u32, // Small GOP: 8-15 frames
|
|
pub b_frames: u32, // Zero B-frames for minimal latency
|
|
pub max_b_frames: u32, // Always 0 for low latency
|
|
pub lookahead: u32, // Minimal lookahead: 0-2 frames
|
|
|
|
// Rate control
|
|
pub rc_mode: RateControlMode, // CBR or VBR with strict constraints
|
|
pub bitrate: u32, // Adaptive bitrate
|
|
pub max_bitrate: u32, // Tight max constraint
|
|
pub min_bitrate: u32,
|
|
pub vbv_buffer_size: u32, // Very small VBV buffer
|
|
pub vbv_max_rate: u32, // Close to bitrate
|
|
|
|
// Timing
|
|
pub fps: u32, // Target FPS (30-60)
|
|
pub intra_period: u32, // Keyframe interval
|
|
|
|
// Quality vs Latency trade-offs
|
|
pub preset: EncoderPreset, // Ultrafast/Fast
|
|
pub tune: EncoderTune, // zerolatency
|
|
pub quality: u8, // Constant quality (CRF) or CQ
|
|
}
|
|
|
|
pub enum VideoCodec {
|
|
H264, // Best compatibility, good latency
|
|
H265, // Better compression, slightly higher latency
|
|
VP9, // Open alternative
|
|
}
|
|
|
|
pub enum RateControlMode {
|
|
CBR, // Constant Bitrate - predictable
|
|
VBR, // Variable Bitrate - better quality
|
|
CQP, // Constant Quantizer - lowest latency
|
|
}
|
|
|
|
pub enum EncoderPreset {
|
|
Ultrafast, // Lowest latency, lower quality
|
|
Superfast,
|
|
Veryfast, // Recommended for 15-25ms
|
|
Faster,
|
|
}
|
|
|
|
pub enum EncoderTune {
|
|
Zerolatency, // Mandatory for low latency
|
|
Film,
|
|
Animation,
|
|
}
|
|
```
|
|
|
|
#### Recommended Encoder Settings
|
|
|
|
**VA-API (Intel/AMD) - For 15-25ms latency:**
|
|
|
|
```c
|
|
// libva-specific low-latency settings
|
|
VAConfigAttrib attribs[] = {
|
|
{VAConfigAttribRTFormat, VA_RT_FORMAT_YUV420},
|
|
{VAConfigAttribRateControl, VA_RC_CBR},
|
|
{VAConfigAttribEncMaxRefFrames, {1, 0}}, // Min reference frames
|
|
{VAConfigAttribEncPackedHeaders, VA_ENC_PACKED_HEADER_SEQUENCE},
|
|
};
|
|
|
|
VAEncSequenceParameterBufferH264 seq_param = {
|
|
.intra_period = 15, // Short GOP
|
|
.ip_period = 1, // No B-frames
|
|
.bits_per_second = 4000000,
|
|
.max_num_ref_frames = 1, // Minimal references
|
|
.time_scale = 90000,
|
|
.num_units_in_tick = 1500, // 60 FPS
|
|
};
|
|
|
|
VAEncPictureParameterBufferH264 pic_param = {
|
|
.reference_frames = {
|
|
{0, VA_FRAME_PICTURE}, // Single reference
|
|
},
|
|
.num_ref_idx_l0_active_minus1 = 0,
|
|
.num_ref_idx_l1_active_minus1 = 0,
|
|
.pic_fields.bits.idr_pic_flag = 0,
|
|
.pic_fields.bits.reference_pic_flag = 1,
|
|
};
|
|
|
|
VAEncSliceParameterBufferH264 slice_param = {
|
|
.num_ref_idx_l0_active_minus1 = 0,
|
|
.num_ref_idx_l1_active_minus1 = 0,
|
|
.disable_deblocking_filter_idc = 1, // Faster
|
|
};
|
|
```
|
|
|
|
**NVENC (NVIDIA) - For 15-25ms latency:**
|
|
|
|
```rust
|
|
// NVENC low-latency configuration
|
|
let mut create_params = NV_ENC_INITIALIZE_PARAMS::default();
|
|
create_params.encodeGUID = NV_ENC_CODEC_H264_GUID;
|
|
create_params.presetGUID = NV_ENC_PRESET_P4_GUID; // Low latency
|
|
|
|
let mut config = NV_ENC_CONFIG::default();
|
|
config.profileGUID = NV_ENC_H264_PROFILE_BASELINE_GUID; // Faster encoding
|
|
config.rcParams.rateControlMode = NV_ENC_PARAMS_RC_CBR;
|
|
config.rcParams.averageBitRate = 4000000;
|
|
config.rcParams.maxBitRate = 4000000;
|
|
config.rcParams.vbvBufferSize = 4000000; // 1 second buffer
|
|
config.rcParams.vbvInitialDelay = 0; // Minimal delay
|
|
|
|
let mut h264_config = unsafe { config.encodeCodecConfig.h264Config };
|
|
h264_config.enableIntraRefresh = 1;
|
|
h264_config.idrPeriod = 30; // Keyframe every 30 frames
|
|
h264_config.repeatSPSPPS = 1;
|
|
h264_config.enableConstrainedEncoding = 1;
|
|
h264_config.frameNumD = 0;
|
|
h264_config.sliceMode = NV_ENC_SLICE_MODE_AUTOSELECT;
|
|
|
|
// Low-latency specific
|
|
h264_config.maxNumRefFrames = 1; // Minimal references
|
|
h264_config.idrPeriod = 15; // Shorter GOP
|
|
```
|
|
|
|
**x264 (Software) - For 50-100ms latency:**
|
|
|
|
```rust
|
|
// x264 parameters for low latency
|
|
let param = x264_param_t {
|
|
i_width: 1920,
|
|
i_height: 1080,
|
|
i_fps_num: 60,
|
|
i_fps_den: 1,
|
|
|
|
// Rate control
|
|
i_bitrate: 4000, // 4 Mbps
|
|
i_keyint_max: 15, // Short GOP
|
|
b_intra_refresh: 1,
|
|
|
|
// Low latency
|
|
b_repeat_headers: 1,
|
|
b_annexb: 1,
|
|
i_scenecut_threshold: 0, // Disable scene detection
|
|
|
|
// No B-frames for latency
|
|
i_bframe: 0,
|
|
i_bframe_adaptive: 0,
|
|
i_bframe_pyramid: 0,
|
|
|
|
// References
|
|
i_frame_reference: 1, // Minimal references
|
|
|
|
// Preset: ultrafast or superfast
|
|
// This is set via preset function
|
|
};
|
|
|
|
// Apply preset
|
|
x264_param_apply_preset(¶m, "superfast");
|
|
x264_param_apply_tune(¶m, "zerolatency");
|
|
```
|
|
|
|
#### Dynamic Bitrate vs Latency Trade-offs
|
|
|
|
```rust
|
|
pub struct AdaptiveBitrateController {
|
|
target_latency_ms: u32,
|
|
current_bitrate: u32,
|
|
frame_rate: u32,
|
|
network_quality: NetworkQuality,
|
|
buffer_depth_ms: u32,
|
|
}
|
|
|
|
pub struct NetworkQuality {
|
|
bandwidth_mbps: f64,
|
|
latency_ms: u32,
|
|
packet_loss_rate: f64,
|
|
jitter_ms: u32,
|
|
}
|
|
|
|
impl AdaptiveBitrateController {
|
|
pub fn update_target_bitrate(&mut self, measured_latency_ms: u32) -> u32 {
|
|
let latency_ratio = measured_latency_ms as f64 / self.target_latency_ms as f64;
|
|
|
|
if latency_ratio > 1.5 {
|
|
// Latency too high - reduce bitrate aggressively
|
|
self.current_bitrate = (self.current_bitrate as f64 * 0.7) as u32;
|
|
} else if latency_ratio > 1.2 {
|
|
// Moderately high - reduce bitrate
|
|
self.current_bitrate = (self.current_bitrate as f64 * 0.85) as u32;
|
|
} else if latency_ratio < 0.8 {
|
|
// Can increase bitrate
|
|
self.current_bitrate = (self.current_bitrate as f64 * 1.1) as u32;
|
|
}
|
|
|
|
// Clamp to reasonable bounds
|
|
self.current_bitrate = self.current_bitrate.clamp(1000000, 8000000);
|
|
|
|
self.current_bitrate
|
|
}
|
|
}
|
|
```
|
|
|
|
### 2. Capture Optimization
|
|
|
|
#### PipeWire DMA-BUF Zero-Copy
|
|
|
|
```rust
|
|
pub struct LowLatencyCaptureConfig {
|
|
pub frame_rate: u32, // 30-60 FPS
|
|
pub zero_copy: bool, // Always true
|
|
pub track_damage: bool, // Enable damage tracking
|
|
pub partial_updates: bool, // Encode only damaged regions
|
|
pub buffer_pool_size: usize, // Small pool: 3-5 buffers
|
|
}
|
|
|
|
pub struct DamageTracker {
|
|
damaged_regions: VecDeque<ScreenRegion>,
|
|
last_frame: Option<DmaBufHandle>,
|
|
threshold: u32, // Minimum change size to encode
|
|
}
|
|
|
|
impl DamageTracker {
|
|
pub fn update(&mut self, new_frame: &CapturedFrame) -> Vec<ScreenRegion> {
|
|
match &self.last_frame {
|
|
Some(last) => {
|
|
let regions = self.compute_damage_regions(last, new_frame);
|
|
self.last_frame = Some(new_frame.dma_buf.clone());
|
|
regions
|
|
}
|
|
None => {
|
|
self.last_frame = Some(new_frame.dma_buf.clone());
|
|
vec![ScreenRegion {
|
|
x: 0,
|
|
y: 0,
|
|
width: new_frame.width,
|
|
height: new_frame.height,
|
|
}]
|
|
}
|
|
}
|
|
}
|
|
|
|
fn compute_damage_regions(&self, last: &DmaBufHandle, new: &CapturedFrame) -> Vec<ScreenRegion> {
|
|
// Compare frames and find changed regions
|
|
// This can be done efficiently with GPU
|
|
// For MVP, we can use a simple block-based comparison
|
|
|
|
// Block size for comparison (e.g., 16x16 pixels)
|
|
let block_size = 16;
|
|
let blocks_x = (new.width as usize + block_size - 1) / block_size;
|
|
let blocks_y = (new.height as usize + block_size - 1) / block_size;
|
|
|
|
// Merge adjacent damaged blocks into regions
|
|
// ...
|
|
|
|
vec![] // Placeholder
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Partial Region Encoding
|
|
|
|
```rust
|
|
pub struct RegionEncoder {
|
|
full_encoder: Box<dyn VideoEncoder>,
|
|
tile_encoder: Box<dyn VideoEncoder>,
|
|
current_regions: Vec<ScreenRegion>,
|
|
}
|
|
|
|
impl RegionEncoder {
|
|
pub async fn encode_with_regions(
|
|
&mut self,
|
|
frame: CapturedFrame,
|
|
regions: Vec<ScreenRegion>,
|
|
) -> Result<Vec<EncodedTile>> {
|
|
let mut encoded_tiles = Vec::new();
|
|
|
|
if regions.is_empty() || regions.len() > 4 {
|
|
// Too many regions or no changes - encode full frame
|
|
let encoded = self.full_encoder.encode(frame).await?;
|
|
encoded_tiles.push(EncodedTile {
|
|
region: ScreenRegion {
|
|
x: 0,
|
|
y: 0,
|
|
width: frame.width,
|
|
height: frame.height,
|
|
},
|
|
data: encoded.data,
|
|
is_keyframe: encoded.is_keyframe,
|
|
});
|
|
} else {
|
|
// Encode each damaged region separately
|
|
for region in regions {
|
|
let cropped = self.crop_frame(&frame, ®ion)?;
|
|
let encoded = self.tile_encoder.encode(cropped).await?;
|
|
|
|
encoded_tiles.push(EncodedTile {
|
|
region,
|
|
data: encoded.data,
|
|
is_keyframe: encoded.is_keyframe,
|
|
});
|
|
}
|
|
}
|
|
|
|
Ok(encoded_tiles)
|
|
}
|
|
|
|
fn crop_frame(&self, frame: &CapturedFrame, region: &ScreenRegion) -> Result<CapturedFrame> {
|
|
// Adjust DMA-BUF offsets for the region
|
|
// This is a zero-copy operation - just metadata changes
|
|
|
|
Ok(CapturedFrame {
|
|
dma_buf: DmaBufHandle::from_region(&frame.dma_buf, region)?,
|
|
width: region.width,
|
|
height: region.height,
|
|
format: frame.format,
|
|
timestamp: frame.timestamp,
|
|
})
|
|
}
|
|
}
|
|
```
|
|
|
|
### 3. WebRTC Transport Layer Optimization
|
|
|
|
#### Low-Latency WebRTC Configuration
|
|
|
|
```rust
|
|
pub struct LowLatencyWebRtcConfig {
|
|
// ICE and transport
|
|
pub ice_transport_policy: IceTransportPolicy,
|
|
pub ice_servers: Vec<IceServer>,
|
|
|
|
// Media settings
|
|
pub video_codecs: Vec<VideoCodecConfig>,
|
|
pub max_bitrate: u32,
|
|
pub min_bitrate: u32,
|
|
pub start_bitrate: u32,
|
|
|
|
// Buffering - minimize for low latency
|
|
pub playout_delay_min_ms: u16, // 0-10ms (default 50ms)
|
|
pub playout_delay_max_ms: u16, // 10-20ms (default 200ms)
|
|
|
|
// Packetization
|
|
pub rtp_payload_size: u16, // Smaller packets: 1200 bytes
|
|
pub packetization_mode: PacketizationMode,
|
|
|
|
// Feedback and retransmission
|
|
pub nack_enabled: bool, // Limited NACK
|
|
pub fec_enabled: bool, // Disable FEC for latency
|
|
pub transport_cc_enabled: bool, // Congestion control
|
|
|
|
// RTCP settings
|
|
pub rtcp_report_interval_ms: u32, // Frequent: 50-100ms
|
|
}
|
|
|
|
pub struct VideoCodecConfig {
|
|
pub name: String,
|
|
pub clock_rate: u32,
|
|
pub num_channels: u16,
|
|
pub parameters: CodecParameters,
|
|
}
|
|
|
|
impl LowLatencyWebRtcConfig {
|
|
pub fn for_ultra_low_latency() -> Self {
|
|
Self {
|
|
ice_transport_policy: IceTransportPolicy::All,
|
|
ice_servers: vec![],
|
|
|
|
video_codecs: vec![
|
|
VideoCodecConfig {
|
|
name: "H264".to_string(),
|
|
clock_rate: 90000,
|
|
num_channels: 1,
|
|
parameters: CodecParameters {
|
|
profile_level_id: "42e01f".to_string(), // Baseline profile
|
|
packetization_mode: 1,
|
|
level_asymmetry_allowed: 1,
|
|
},
|
|
},
|
|
],
|
|
|
|
max_bitrate: 8000000, // 8 Mbps max
|
|
min_bitrate: 500000, // 500 Kbps min
|
|
start_bitrate: 4000000, // 4 Mbps start
|
|
|
|
// Critical: Minimal playout delay
|
|
playout_delay_min_ms: 0, // No minimum
|
|
playout_delay_max_ms: 20, // 20ms maximum
|
|
|
|
// Smaller packets for lower serialization latency
|
|
rtp_payload_size: 1200,
|
|
packetization_mode: PacketizationMode::NonInterleaved,
|
|
|
|
// Limited retransmission
|
|
nack_enabled: true, // But limit retransmission window
|
|
fec_enabled: false, // Disable FEC - adds latency
|
|
transport_cc_enabled: true,
|
|
|
|
// More frequent RTCP feedback
|
|
rtcp_report_interval_ms: 50,
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Packet Loss Handling Strategy
|
|
|
|
```rust
|
|
pub enum LossHandlingStrategy {
|
|
PreferLatency, // Drop late frames, prioritize low latency
|
|
PreferQuality, // Retransmit, prioritize quality
|
|
Balanced, // Adaptive based on network conditions
|
|
}
|
|
|
|
pub struct PacketLossHandler {
|
|
strategy: LossHandlingStrategy,
|
|
max_retransmission_delay_ms: u32,
|
|
nack_window_size: u32,
|
|
}
|
|
|
|
impl PacketLossHandler {
|
|
pub fn handle_packet_loss(
|
|
&mut self,
|
|
sequence_number: u16,
|
|
now_ms: u64,
|
|
) -> RetransmissionDecision {
|
|
match self.strategy {
|
|
LossHandlingStrategy::PreferLatency => {
|
|
// Don't retransmit if too old
|
|
if now_ms > self.max_retransmission_delay_ms as u64 {
|
|
RetransmissionDecision::Drop
|
|
} else {
|
|
RetransmissionDecision::None
|
|
}
|
|
}
|
|
LossHandlingStrategy::PreferQuality => {
|
|
// Always try to retransmit
|
|
RetransmissionDecision::Request(sequence_number)
|
|
}
|
|
LossHandlingStrategy::Balanced => {
|
|
// Adaptive based on loss rate
|
|
RetransmissionDecision::None // Placeholder
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
pub enum RetransmissionDecision {
|
|
Request(u16),
|
|
Drop,
|
|
None,
|
|
}
|
|
```
|
|
|
|
#### NACK vs FEC Selection
|
|
|
|
**Recommendation for 15-25ms latency:**
|
|
|
|
- **Primary**: Limited NACK
|
|
- NACK window: 1-2 frames (16-33ms at 60fps)
|
|
- Max retransmission delay: 20ms
|
|
- Only retransmit keyframes or critical packets
|
|
|
|
- **Avoid FEC**:
|
|
- Forward Error Correction adds significant latency
|
|
- With low-loss LAN, FEC overhead outweighs benefits
|
|
- Use NACK selectively instead
|
|
|
|
```rust
|
|
pub struct NackController {
|
|
window_size_ms: u32, // 20ms window
|
|
max_nack_packets_per_second: u32,
|
|
nack_list: VecDeque<(u16, u64)>, // (seq_num, timestamp_ms)
|
|
}
|
|
|
|
impl NackController {
|
|
pub fn should_send_nack(&self, seq_num: u16, now_ms: u64) -> bool {
|
|
// Check if packet is within NACK window
|
|
if let Some(&(_, oldest_ts)) = self.nack_list.front() {
|
|
if now_ms - oldest_ts > self.window_size_ms as u64 {
|
|
return false; // Too old
|
|
}
|
|
}
|
|
|
|
true
|
|
}
|
|
}
|
|
```
|
|
|
|
### 4. Frame Rate and Buffer Strategy
|
|
|
|
#### Dynamic Frame Rate Adjustment
|
|
|
|
```rust
|
|
pub struct FrameRateController {
|
|
target_fps: u32, // Desired FPS (30-60)
|
|
current_fps: u32,
|
|
frame_times: VecDeque<Instant>,
|
|
last_frame_time: Instant,
|
|
min_interval: Duration, // 1 / max_fps
|
|
}
|
|
|
|
impl FrameRateController {
|
|
pub fn new(target_fps: u32) -> Self {
|
|
let min_interval = Duration::from_micros(1_000_000 / 60); // Max 60 FPS
|
|
|
|
Self {
|
|
target_fps,
|
|
current_fps: 30,
|
|
frame_times: VecDeque::with_capacity(60),
|
|
last_frame_time: Instant::now(),
|
|
min_interval,
|
|
}
|
|
}
|
|
|
|
pub fn should_capture(&mut self) -> bool {
|
|
let now = Instant::now();
|
|
let elapsed = now.duration_since(self.last_frame_time);
|
|
|
|
if elapsed < self.min_interval {
|
|
return false; // Too soon
|
|
}
|
|
|
|
// Update frame rate based on conditions
|
|
self.adjust_fps_based_on_conditions();
|
|
|
|
self.last_frame_time = now;
|
|
true
|
|
}
|
|
|
|
pub fn adjust_fps_based_on_conditions(&mut self) {
|
|
// Check system load, network conditions, etc.
|
|
let system_load = self.get_system_load();
|
|
let network_quality = self.get_network_quality();
|
|
|
|
if system_load > 0.8 || network_quality.is_poor() {
|
|
self.current_fps = 30; // Reduce frame rate
|
|
} else if system_load < 0.5 && network_quality.is_excellent() {
|
|
self.current_fps = 60; // Increase frame rate
|
|
} else {
|
|
self.current_fps = 45; // Balanced
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Fast Frame Dropping Strategy
|
|
|
|
```rust
|
|
pub struct FrameDropper {
|
|
target_fps: u32,
|
|
adaptive_drop_threshold_ms: u32,
|
|
consecutive_drops: u32,
|
|
max_consecutive_drops: u32,
|
|
}
|
|
|
|
impl FrameDropper {
|
|
pub fn should_drop(&mut self, queue_latency_ms: u32) -> bool {
|
|
if queue_latency_ms > self.adaptive_drop_threshold_ms {
|
|
if self.consecutive_drops < self.max_consecutive_drops {
|
|
self.consecutive_drops += 1;
|
|
return true;
|
|
}
|
|
}
|
|
|
|
self.consecutive_drops = 0;
|
|
false
|
|
}
|
|
|
|
pub fn get_drop_interval(&self) -> u32 {
|
|
// Calculate how many frames to drop
|
|
match self.target_fps {
|
|
60 => 1, // Drop 1 out of every 2
|
|
30 => 1, // Drop 1 out of every 2
|
|
_ => 0,
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Minimal Buffering
|
|
|
|
**Sender Side:**
|
|
|
|
```rust
|
|
pub struct SenderBuffer {
|
|
max_size_frames: usize, // Very small: 1-2 frames
|
|
queue: VecDeque<EncodedFrame>,
|
|
target_latency_ms: u32,
|
|
}
|
|
|
|
impl SenderBuffer {
|
|
pub fn new() -> Self {
|
|
Self {
|
|
max_size_frames: 1, // Single frame buffer
|
|
queue: VecDeque::with_capacity(2),
|
|
target_latency_ms: 5, // 5ms target
|
|
}
|
|
}
|
|
|
|
pub fn push(&mut self, frame: EncodedFrame) -> Result<()> {
|
|
if self.queue.len() >= self.max_size_frames {
|
|
// Drop oldest frame to maintain low latency
|
|
self.queue.pop_front();
|
|
}
|
|
|
|
self.queue.push_back(frame);
|
|
Ok(())
|
|
}
|
|
|
|
pub fn pop(&mut self) -> Option<EncodedFrame> {
|
|
self.queue.pop_front()
|
|
}
|
|
}
|
|
```
|
|
|
|
**Receiver Side (Jitter Buffer):**
|
|
|
|
```rust
|
|
pub struct MinimalJitterBuffer {
|
|
target_delay_ms: u32, // 0-10ms
|
|
min_delay_ms: u32, // 0ms
|
|
max_delay_ms: u32, // 10-20ms
|
|
packets: VecDeque<RtpPacket>,
|
|
}
|
|
|
|
impl MinimalJitterBuffer {
|
|
pub fn new() -> Self {
|
|
Self {
|
|
target_delay_ms: 5, // 5ms target
|
|
min_delay_ms: 0, // No minimum
|
|
max_delay_ms: 10, // 10ms maximum
|
|
packets: VecDeque::with_capacity(10),
|
|
}
|
|
}
|
|
|
|
pub fn push(&mut self, packet: RtpPacket) {
|
|
if self.packets.len() < self.max_delay_ms as usize / 2 {
|
|
self.packets.push_back(packet);
|
|
} else {
|
|
// Buffer full - drop oldest
|
|
self.packets.pop_front();
|
|
self.packets.push_back(packet);
|
|
}
|
|
}
|
|
|
|
pub fn pop(&mut self) -> Option<RtpPacket> {
|
|
self.packets.pop_front()
|
|
}
|
|
}
|
|
```
|
|
|
|
### 5. Architecture Adjustments
|
|
|
|
#### Single-Threaded vs Multi-Threaded
|
|
|
|
**Recommendation: Hybrid Approach**
|
|
|
|
- **Capture Thread**: Dedicated thread for PipeWire
|
|
- **Encoder Thread**: Per-session encoder thread
|
|
- **Network Thread**: WebRTC transport thread
|
|
- **Coordination**: Lock-free channels for data passing
|
|
|
|
```rust
|
|
pub struct PipelineArchitecture {
|
|
capture_thread: JoinHandle<()>,
|
|
encoder_threads: Vec<JoinHandle<()>>,
|
|
network_thread: JoinHandle<()>,
|
|
|
|
// Lock-free communication
|
|
capture_to_encoder: async_channel::Sender<CapturedFrame>,
|
|
encoder_to_network: async_channel::Sender<EncodedFrame>,
|
|
}
|
|
```
|
|
|
|
#### Lock Competition Minimization
|
|
|
|
```rust
|
|
// Use lock-free data structures where possible
|
|
use crossbeam::queue::SegQueue;
|
|
use crossbeam::channel::{bounded, unbounded};
|
|
|
|
pub struct LockFreeFrameQueue {
|
|
queue: SegQueue<CapturedFrame>,
|
|
max_size: usize,
|
|
}
|
|
|
|
impl LockFreeFrameQueue {
|
|
pub fn push(&self, frame: CapturedFrame) -> Result<()> {
|
|
if self.queue.len() >= self.max_size {
|
|
return Err(Error::QueueFull);
|
|
}
|
|
self.queue.push(frame);
|
|
Ok(())
|
|
}
|
|
|
|
pub fn pop(&self) -> Option<CapturedFrame> {
|
|
self.queue.pop()
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Async Task Scheduling
|
|
|
|
```rust
|
|
pub struct LowLatencyScheduler {
|
|
capture_priority: TaskPriority,
|
|
encode_priority: TaskPriority,
|
|
network_priority: TaskPriority,
|
|
}
|
|
|
|
impl LowLatencyScheduler {
|
|
pub async fn schedule_pipeline(&self) {
|
|
tokio::spawn_with_priority(TaskPriority::High, async move {
|
|
// Critical path: capture -> encode -> send
|
|
});
|
|
|
|
tokio::spawn_with_priority(TaskPriority::Medium, async move {
|
|
// Background tasks: statistics, logging
|
|
});
|
|
}
|
|
}
|
|
```
|
|
|
|
### 6. Technology Stack Adjustments
|
|
|
|
#### Encoder Selection for Latency
|
|
|
|
| Encoder | Setup Latency | Per-Frame Latency | Quality | Recommendation |
|
|
|---------|--------------|------------------|---------|----------------|
|
|
| VA-API H.264 | 1-2ms | 2-3ms | Medium | Primary (Linux) |
|
|
| NVENC H.264 | 1-2ms | 1-2ms | High | Primary (NVIDIA) |
|
|
| x264 (ultrafast) | 0ms | 5-8ms | Low | Fallback |
|
|
| x264 (superfast) | 0ms | 8-12ms | Medium | Fallback |
|
|
|
|
**Recommendation:**
|
|
- **Primary**: VA-API or NVENC H.264 with ultrafast preset
|
|
- **Fallback**: x264 with ultrafast preset (accept 30-50ms latency)
|
|
|
|
#### Direct Wayland vs PipeWire
|
|
|
|
**Use PipeWire (recommended):**
|
|
- Better DMA-BUF support
|
|
- Hardware acceleration integration
|
|
- Zero-copy through ecosystem
|
|
|
|
**Direct Wayland (if needed):**
|
|
- Lower-level control
|
|
- Potentially lower capture latency (0.5-1ms)
|
|
- More complex implementation
|
|
- No portal integration (security issue)
|
|
|
|
**Recommendation:** Stick with PipeWire for MVP. Consider direct Wayland only if PipeWire latency is unacceptable.
|
|
|
|
#### webrtc-rs Latency Characteristics
|
|
|
|
**Pros:**
|
|
- Pure Rust, predictable behavior
|
|
- Good zero-copy support
|
|
- Customizable buffering
|
|
|
|
**Cons:**
|
|
- May have default buffer settings optimized for reliability
|
|
- Need manual configuration for ultra-low latency
|
|
|
|
**Custom WebRTC Layer (advanced):**
|
|
- Full control over buffering and timing
|
|
- Can inline packetization
|
|
- More complex implementation
|
|
|
|
**Recommendation:** Use webrtc-rs with low-latency configuration. Only consider custom layer if webrtc-rs cannot achieve targets.
|
|
|
|
### 7. Implementation Priority
|
|
|
|
#### P0 (Must-Have for MVP)
|
|
|
|
1. **Hardware Encoder Integration**
|
|
- VA-API H.264 with low-latency settings
|
|
- No B-frames, small GOP (15 frames)
|
|
- Ultrafast preset
|
|
|
|
2. **DMA-BUF Zero-Copy**
|
|
- PipeWire DMA-BUF import
|
|
- Direct encoder feed
|
|
- No CPU copies
|
|
|
|
3. **Minimal Buffering**
|
|
- Single frame sender buffer
|
|
- 0-5ms jitter buffer
|
|
- Fast frame dropping
|
|
|
|
4. **Low-Latency WebRTC Config**
|
|
- playout_delay_min: 0ms
|
|
- playout_delay_max: 20ms
|
|
- Disable FEC
|
|
|
|
#### P1 (Important for 15-25ms)
|
|
|
|
1. **Damage Tracking**
|
|
- Partial region updates
|
|
- Reduced encoding load
|
|
|
|
2. **Dynamic Frame Rate**
|
|
- 30-60 FPS adaptation
|
|
- Network-aware
|
|
|
|
3. **NACK Control**
|
|
- Limited retransmission window (20ms)
|
|
- Selective NACK
|
|
|
|
#### P2 (Nice-to-Have)
|
|
|
|
1. **Direct Wayland Capture**
|
|
- If PipeWire latency insufficient
|
|
|
|
2. **Custom WebRTC Layer**
|
|
- If webrtc-rs insufficient
|
|
|
|
3. **Advanced Congestion Control**
|
|
- SCReAM or Google Congestion Control
|
|
|
|
### 8. Testing and Validation
|
|
|
|
#### End-to-End Latency Measurement
|
|
|
|
```rust
|
|
pub struct LatencyMeter {
|
|
timestamps: VecDeque<(u64, LatencyStage)>,
|
|
}
|
|
|
|
pub enum LatencyStage {
|
|
Capture,
|
|
EncodeStart,
|
|
EncodeEnd,
|
|
Packetize,
|
|
NetworkSend,
|
|
NetworkReceive,
|
|
Depacketize,
|
|
DecodeStart,
|
|
DecodeEnd,
|
|
Display,
|
|
}
|
|
|
|
impl LatencyMeter {
|
|
pub fn mark(&mut self, stage: LatencyStage) {
|
|
let now = timestamp_ns();
|
|
self.timestamps.push_back((now, stage));
|
|
}
|
|
|
|
pub fn calculate_total_latency(&self) -> Duration {
|
|
if self.timestamps.len() < 2 {
|
|
return Duration::ZERO;
|
|
}
|
|
|
|
let first = self.timestamps.front().unwrap().0;
|
|
let last = self.timestamps.back().unwrap().0;
|
|
Duration::from_nanos(last - first)
|
|
}
|
|
}
|
|
```
|
|
|
|
**Measurement Method:**
|
|
|
|
1. **Timestamp Injection**
|
|
- Inject frame ID at capture (visible timestamp on screen)
|
|
- Capture at client with camera
|
|
- Compare timestamps to calculate round-trip
|
|
- Divide by 2 for one-way latency
|
|
|
|
2. **Network Timestamping**
|
|
- Add frame capture time in RTP header extension
|
|
- Measure at receiver
|
|
- Account for clock skew
|
|
|
|
3. **Hardware Timestamping**
|
|
- Use kernel packet timestamps (SO_TIMESTAMPING)
|
|
- Hardware NIC timestamps if available
|
|
|
|
#### Performance Benchmarking
|
|
|
|
```rust
|
|
#[bench]
|
|
fn bench_full_pipeline_latency(b: &mut Bencher) {
|
|
let mut pipeline = LowLatencyPipeline::new(config).unwrap();
|
|
let mut latencies = Vec::new();
|
|
|
|
b.iter(|| {
|
|
let start = Instant::now();
|
|
|
|
let frame = pipeline.capture().unwrap();
|
|
let encoded = pipeline.encode(frame).unwrap();
|
|
pipeline.send(encoded).unwrap();
|
|
|
|
latencies.push(start.elapsed());
|
|
});
|
|
|
|
let avg_latency = latencies.iter().sum::<Duration>() / latencies.len() as u32;
|
|
println!("Average latency: {:?}", avg_latency);
|
|
}
|
|
```
|
|
|
|
**Target Benchmarks:**
|
|
|
|
| Metric | Target | Acceptable |
|
|
|--------|--------|------------|
|
|
| Capture latency | 2-3ms | <5ms |
|
|
| Encode latency | 3-5ms | <8ms |
|
|
| Packetize latency | 1-2ms | <3ms |
|
|
| Network (LAN) | 0.5-1ms | <2ms |
|
|
| Decode latency | 1-2ms | <4ms |
|
|
| Display latency | 1-2ms | <4ms |
|
|
| **Total** | **15-25ms** | **<30ms** |
|
|
|
|
#### Tuning Strategy
|
|
|
|
1. **Baseline Measurement**
|
|
- Measure each stage individually
|
|
- Identify bottlenecks
|
|
|
|
2. **Iterative Tuning**
|
|
- Tune one parameter at a time
|
|
- Measure impact on total latency
|
|
- Trade off quality if needed
|
|
|
|
3. **Validation**
|
|
- Test under various network conditions
|
|
- Test under system load
|
|
- Test with different content (static, dynamic)
|
|
|
|
4. **Continuous Monitoring**
|
|
- Track latency in production
|
|
- Alert on degradation
|
|
- Adaptive adjustments
|
|
|
|
---
|
|
|
|
## Implementation Roadmap (Updated for Low Latency)
|
|
|
|
### Phase 1: MVP (Minimum Viable Product) - 4-6 weeks
|
|
|
|
**Goal:** Basic screen capture and WebRTC streaming
|
|
|
|
**Week 1-2: Core Infrastructure**
|
|
- [ ] Project setup (Cargo.toml, directory structure)
|
|
- [ ] Tokio async runtime setup
|
|
- [ ] Error handling framework (anyhow/thiserror)
|
|
- [ ] Logging setup (tracing)
|
|
- [ ] Configuration management
|
|
|
|
**Week 2-3: Wayland Capture**
|
|
- [ ] PipeWire xdg-desktop-portal integration
|
|
- [ ] Basic screen capture (single monitor)
|
|
- [ ] DMA-BUF import/export
|
|
- [ ] Frame receiver channel
|
|
|
|
**Week 3-4: Simple Encoding**
|
|
- [ ] x264 software encoder (fallback)
|
|
- [ ] Basic frame pipeline (capture → encode)
|
|
- [ ] Frame rate limiting
|
|
|
|
**Week 4-5: WebRTC Transport**
|
|
- [ ] webrtc-rs integration
|
|
- [ ] Basic peer connection
|
|
- [ ] Video track setup
|
|
- [ ] Simple signaling (WebSocket)
|
|
|
|
**Week 5-6: Testing & Integration**
|
|
- [ ] End-to-end test (Wayland → WebRTC → Browser)
|
|
- [ ] Performance benchmarking
|
|
- [ ] Bug fixes
|
|
|
|
**MVP Deliverables:**
|
|
- Working screen capture
|
|
- WebRTC streaming to browser
|
|
- 15-30 FPS at 720p
|
|
- x264 encoding (software)
|
|
|
|
---
|
|
|
|
### Phase 2: Hardware Acceleration - 3-4 weeks
|
|
|
|
**Goal:** GPU-accelerated encoding for better performance
|
|
|
|
**Week 1-2: VA-API Integration**
|
|
- [ ] VA-API encoder implementation
|
|
- [ ] DMA-BUF to VA-API surface import
|
|
- [ ] H.264 encoding
|
|
- [ ] Intel/AMD GPU support
|
|
|
|
**Week 2-3: NVENC Integration**
|
|
- [ ] NVENC encoder for NVIDIA GPUs
|
|
- [ ] CUDA memory management
|
|
- [ ] NVENC H.264 encoding
|
|
|
|
**Week 3-4: Encoder Selection**
|
|
- [ ] Encoder detection and selection
|
|
- [ ] Fallback chain (NVENC → VA-API → x264)
|
|
- [ ] Encoder switching at runtime
|
|
|
|
**Phase 2 Deliverables:**
|
|
- GPU-accelerated encoding
|
|
- 30-60 FPS at 1080p
|
|
- Lower CPU usage
|
|
- Adaptive encoder selection
|
|
|
|
---
|
|
|
|
### Phase 3: Low Latency Optimization - 4-5 weeks
|
|
|
|
**Goal:** Achieve 25-35ms latency on local networks
|
|
|
|
**Week 1: Encoder Low-Latency Configuration (P0)**
|
|
- [ ] Configure VA-API/NVENC for <5ms encoding
|
|
- [ ] Disable B-frames, set GOP to 15 frames
|
|
- [ ] Implement CBR rate control with small VBV buffer
|
|
- [ ] Tune encoder preset (ultrafast/superfast)
|
|
- [ ] Measure encoder latency independently
|
|
|
|
**Week 2: Minimal Buffering (P0)**
|
|
- [ ] Reduce sender buffer to 1 frame
|
|
- [ ] Implement 0-10ms jitter buffer
|
|
- [ ] Configure WebRTC playout delay (0-20ms)
|
|
- [ ] Disable FEC for latency
|
|
- [ ] Test end-to-end latency
|
|
|
|
**Week 3: Damage Tracking & Partial Updates (P1)**
|
|
- [ ] Implement region change detection
|
|
- [ ] Add partial region encoding
|
|
- [ ] Optimize for static content
|
|
- [ ] Benchmark latency improvements
|
|
|
|
**Week 4: Dynamic Frame Rate & Quality (P1)**
|
|
- [ ] Implement adaptive frame rate (30-60fps)
|
|
- [ ] Network quality detection
|
|
- [ ] Dynamic bitrate vs latency trade-off
|
|
- [ ] Fast frame dropping under load
|
|
|
|
**Week 5: Advanced Optimizations (P1/P2)**
|
|
- [ ] Limited NACK window (20ms)
|
|
- [ ] Selective packet retransmission
|
|
- [ ] RTCP fine-tuning (50ms intervals)
|
|
- [ ] Performance profiling
|
|
- [ ] Final latency tuning
|
|
|
|
**Phase 3 Deliverables:**
|
|
- 25-35ms latency on LAN
|
|
- Zero-copy DMA-BUF pipeline
|
|
- Hardware encoder with low-latency config
|
|
- Minimal buffering throughout pipeline
|
|
- Adaptive quality based on conditions
|
|
|
|
---
|
|
|
|
### Phase 4: Production Ready with Ultra Low Latency - 5-7 weeks
|
|
|
|
**Goal:** Achieve 15-25ms latency while ensuring security, reliability, and deployment readiness
|
|
|
|
**Week 1-2: Ultra Low Latency Tuning (P1/P2)**
|
|
- [ ] Direct Wayland capture evaluation (if needed)
|
|
- [ ] Custom WebRTC layer evaluation (if needed)
|
|
- [ ] Advanced congestion control (SCReAM/Google CC)
|
|
- [ ] Kernel bypass optimization (DPDK/AF_XDP if needed)
|
|
- [ ] Final latency optimization and tuning
|
|
|
|
**Week 2-3: Security**
|
|
- [ ] Authentication (JWT, OAuth)
|
|
- [ ] Encryption (TLS, DTLS)
|
|
- [ ] Session management
|
|
- [ ] Access control
|
|
- [ ] Security audit and penetration testing
|
|
|
|
**Week 3-4: Reliability**
|
|
- [ ] Error recovery
|
|
- [ ] Connection health monitoring
|
|
- [ ] Automatic reconnection
|
|
- [ ] Graceful degradation with latency awareness
|
|
- [ ] Failover mechanisms
|
|
|
|
**Week 4-5: Monitoring & Debugging**
|
|
- [ ] Real-time latency metrics collection
|
|
- [ ] Per-stage latency tracking
|
|
- [ ] Logging improvements
|
|
- [ ] Debug mode with frame inspection
|
|
- [ ] Performance dashboard with latency visualization
|
|
- [ ] Alerting for latency degradation
|
|
|
|
**Week 5-6: Deployment**
|
|
- [ ] Docker containerization
|
|
- [ ] Systemd service
|
|
- [ ] Configuration file with low-latency presets
|
|
- [ ] Installation scripts
|
|
- [ ] Performance tuning documentation
|
|
|
|
**Week 6-7: Testing**
|
|
- [ ] Integration tests
|
|
- [ ] Load testing with latency monitoring
|
|
- [ ] Cross-browser testing
|
|
- [ ] Long-running stability tests
|
|
- [ ] Latency regression tests
|
|
- [ ] Automated performance benchmarks
|
|
|
|
**Phase 4 Deliverables:**
|
|
- 15-25ms latency on LAN
|
|
- Production-ready deployment
|
|
- Security features
|
|
- Monitoring and observability
|
|
- Comprehensive testing
|
|
- Latency regression testing
|
|
|
|
---
|
|
|
|
### Phase 5: Advanced Features (Optional) - Ongoing
|
|
|
|
**Potential Features:**
|
|
- [ ] Audio capture and streaming
|
|
- [ ] Bidirectional input (mouse, keyboard)
|
|
- [ ] Clipboard sharing
|
|
- [ ] File transfer
|
|
- [ ] Recording (save sessions)
|
|
- [ ] Multi-user sessions
|
|
- [ ] Mobile client support
|
|
- [ ] WebRTC data channels for control
|
|
- [ ] WebRTC insertable streams (client-side effects)
|
|
- [ ] Adaptive resolution
|
|
- [ ] H.265/HEVC encoding
|
|
- [ ] AV1 encoding
|
|
- [ ] Screen region selection
|
|
- [ ] Virtual display support
|
|
- [ ] Wayland virtual pointer protocol
|
|
|
|
---
|
|
|
|
### Testing Strategy
|
|
|
|
#### Unit Tests
|
|
```rust
|
|
#[cfg(test)]
|
|
mod tests {
|
|
use super::*;
|
|
|
|
#[tokio::test]
|
|
async fn test_dma_buf_lifecycle() {
|
|
let handle = DmaBufHandle::new(/* ... */);
|
|
assert_eq!(handle.ref_count(), 1);
|
|
|
|
let handle2 = handle.clone();
|
|
assert_eq!(handle.ref_count(), 2);
|
|
|
|
drop(handle);
|
|
assert_eq!(handle2.ref_count(), 1);
|
|
|
|
drop(handle2);
|
|
// Buffer freed
|
|
}
|
|
|
|
#[tokio::test]
|
|
async fn test_encoder_pipeline() {
|
|
let config = EncoderConfig {
|
|
encoder_type: EncoderType::H264_X264,
|
|
bitrate: 2_000_000,
|
|
keyframe_interval: 30,
|
|
preset: EncodePreset::Fast,
|
|
};
|
|
|
|
let mut encoder = X264Encoder::new(config).unwrap();
|
|
|
|
let frame = create_test_frame(1920, 1080);
|
|
let encoded = encoder.encode(frame).await.unwrap();
|
|
|
|
assert!(!encoded.data.is_empty());
|
|
assert!(encoded.is_keyframe);
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Integration Tests
|
|
```rust
|
|
#[tokio::test]
|
|
async fn test_full_pipeline() {
|
|
// Setup
|
|
let capture = WaylandCapture::new(CaptureConfig::default()).await.unwrap();
|
|
let encoder = VaapiEncoder::new(EncoderConfig::default()).unwrap();
|
|
let webrtc = WebRtcServer::new(WebRtcConfig::default()).await.unwrap();
|
|
|
|
// Run pipeline for 100 frames
|
|
for _ in 0..100 {
|
|
let frame = capture.next_frame().await;
|
|
let encoded = encoder.encode(frame).await.unwrap();
|
|
webrtc.send_video_frame("test-session", encoded).await.unwrap();
|
|
}
|
|
|
|
// Verify
|
|
assert_eq!(webrtc.frames_sent(), 100);
|
|
}
|
|
```
|
|
|
|
#### Load Testing
|
|
```bash
|
|
# Simulate 10 concurrent connections
|
|
for i in {1..10}; do
|
|
cargo test test_full_pipeline --release &
|
|
done
|
|
wait
|
|
```
|
|
|
|
#### Performance Benchmarks
|
|
```rust
|
|
#[bench]
|
|
fn bench_encode_frame(b: &mut Bencher) {
|
|
let mut encoder = X264Encoder::new(config).unwrap();
|
|
let frame = create_test_frame(1920, 1080);
|
|
|
|
b.iter(|| {
|
|
encoder.encode(frame.clone()).unwrap()
|
|
});
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Potential Challenges & Solutions
|
|
|
|
### 1. Wayland Protocol Limitations
|
|
|
|
**Challenge:** Wayland's security model restricts screen capture
|
|
|
|
**Solution:**
|
|
- Use xdg-desktop-portal for permission management
|
|
- Implement user prompts for capture authorization
|
|
- Support multiple portal backends (GNOME, KDE, etc.)
|
|
|
|
```rust
|
|
pub async fn request_capture_permission() -> Result<bool> {
|
|
let portal = Portal::new().await?;
|
|
let session = portal.create_session(ScreenCaptureType::Monitor).await?;
|
|
|
|
// User will see a dialog asking for permission
|
|
let sources = portal.request_sources(&session).await?;
|
|
|
|
Ok(!sources.is_empty())
|
|
}
|
|
```
|
|
|
|
**Alternative:** Use PipeWire directly with proper authentication
|
|
|
|
### 2. Hardware Acceleration Compatibility
|
|
|
|
**Challenge:** Different GPUs require different APIs (VA-API, NVENC, etc.)
|
|
|
|
**Solution:**
|
|
- Implement multiple encoder backends
|
|
- Runtime detection of available encoders
|
|
- Graceful fallback to software encoding
|
|
|
|
```rust
|
|
pub fn detect_best_encoder() -> EncoderType {
|
|
// Try NVENC first (NVIDIA)
|
|
if nvenc::is_available() {
|
|
return EncoderType::H264_NVENC;
|
|
}
|
|
|
|
// Try VA-API (Intel/AMD)
|
|
if vaapi::is_available() {
|
|
return EncoderType::H264_VAAPI;
|
|
}
|
|
|
|
// Fallback to software
|
|
EncoderType::H264_X264
|
|
}
|
|
```
|
|
|
|
### 3. Cross-Browser WebRTC Compatibility
|
|
|
|
**Challenge:** Different browsers have different WebRTC implementations
|
|
|
|
**Solution:**
|
|
- Use standardized codecs (H.264, VP8, VP9)
|
|
- Implement codec negotiation
|
|
- Provide fallback options
|
|
|
|
```rust
|
|
pub fn get_supported_codecs() -> Vec<RTCRtpCodecCapability> {
|
|
vec![
|
|
RTCRtpCodecCapability {
|
|
mime_type: "video/H264".to_string(),
|
|
clock_rate: 90000,
|
|
..Default::default()
|
|
},
|
|
RTCRtpCodecCapability {
|
|
mime_type: "video/VP9".to_string(),
|
|
clock_rate: 90000,
|
|
..Default::default()
|
|
},
|
|
]
|
|
}
|
|
```
|
|
|
|
### 4. Security and Authentication
|
|
|
|
**Challenge:** Secure remote access without exposing desktop to unauthorized users
|
|
|
|
**Solution:**
|
|
- Implement JWT-based authentication
|
|
- Use DTLS for media encryption
|
|
- Add rate limiting and access control
|
|
|
|
```rust
|
|
pub struct AuthManager {
|
|
secret: String,
|
|
sessions: Arc<Mutex<HashMap<String, Session>>>,
|
|
}
|
|
|
|
impl AuthManager {
|
|
pub fn create_token(&self, user_id: &str) -> Result<String> {
|
|
let claims = Claims {
|
|
sub: user_id.to_string(),
|
|
exp: Utc::now() + chrono::Duration::hours(1),
|
|
};
|
|
|
|
encode(&Header::default(), &claims, &EncodingKey::from_secret(self.secret.as_ref()))
|
|
}
|
|
|
|
pub fn validate_token(&self, token: &str) -> Result<Claims> {
|
|
decode::<Claims>(
|
|
token,
|
|
&DecodingKey::from_secret(self.secret.as_ref()),
|
|
&Validation::default(),
|
|
)
|
|
.map(|data| data.claims)
|
|
.map_err(|_| AuthError::InvalidToken)
|
|
}
|
|
}
|
|
```
|
|
|
|
### 5. Memory Management
|
|
|
|
**Challenge:** Avoid memory leaks with DMA-BUF and shared memory
|
|
|
|
**Solution:**
|
|
- Use Rust's ownership system
|
|
- RAII patterns for resource cleanup
|
|
- Buffer pools with limits
|
|
|
|
```rust
|
|
pub struct ScopedDmaBuf {
|
|
handle: DmaBufHandle,
|
|
}
|
|
|
|
impl Drop for ScopedDmaBuf {
|
|
fn drop(&mut self) {
|
|
// Automatically release DMA-BUF
|
|
// File descriptor closed
|
|
// GPU memory freed
|
|
}
|
|
}
|
|
|
|
// Usage ensures cleanup
|
|
{
|
|
let buf = ScopedDmaBuf::new(/* ... */);
|
|
// Use buffer
|
|
} // Automatically dropped here
|
|
```
|
|
|
|
### 6. Latency Optimization
|
|
|
|
**Challenge:** Minimize end-to-end latency
|
|
|
|
**Solution:**
|
|
- Zero-copy pipeline
|
|
- Hardware acceleration
|
|
- Adaptive quality
|
|
- Frame skipping
|
|
|
|
```rust
|
|
pub struct LatencyOptimizer {
|
|
target_latency_ms: u32,
|
|
current_latency_ms: u32,
|
|
}
|
|
|
|
impl LatencyOptimizer {
|
|
pub fn adjust_parameters(&mut self) {
|
|
if self.current_latency_ms > self.target_latency_ms {
|
|
// Reduce quality to improve latency
|
|
self.reduce_bitrate();
|
|
self.increase_frame_skipping();
|
|
} else {
|
|
// Increase quality
|
|
self.increase_bitrate();
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### 7. Network Conditions
|
|
|
|
**Challenge:** Varying bandwidth and network conditions
|
|
|
|
**Solution:**
|
|
- Adaptive bitrate streaming
|
|
- Multiple quality presets
|
|
- Congestion control
|
|
|
|
```rust
|
|
pub struct BandwidthMonitor {
|
|
measurements: VecDeque<u32>,
|
|
window_size: usize,
|
|
}
|
|
|
|
impl BandwidthMonitor {
|
|
pub fn update(&mut self, bytes_sent: u32, duration: Duration) {
|
|
let bandwidth = bytes_sent * 8 / duration.as_secs() as u32;
|
|
self.measurements.push_back(bandwidth);
|
|
|
|
if self.measurements.len() > self.window_size {
|
|
self.measurements.pop_front();
|
|
}
|
|
}
|
|
|
|
pub fn average_bandwidth(&self) -> u32 {
|
|
if self.measurements.is_empty() {
|
|
return 0;
|
|
}
|
|
|
|
self.measurements.iter().sum::<u32>() / self.measurements.len() as u32
|
|
}
|
|
}
|
|
```
|
|
|
|
### 8. Cross-Platform Compatibility
|
|
|
|
**Challenge:** Support different Linux distributions and desktop environments
|
|
|
|
**Solution:**
|
|
- Containerize application
|
|
- Detect available technologies at runtime
|
|
- Provide fallback options
|
|
|
|
```rust
|
|
pub fn detect_desktop_environment() -> DesktopEnvironment {
|
|
if std::path::Path::new("/usr/bin/gnome-shell").exists() {
|
|
DesktopEnvironment::GNOME
|
|
} else if std::path::Path::new("/usr/bin/plasmashell").exists() {
|
|
DesktopEnvironment::KDE
|
|
} else {
|
|
DesktopEnvironment::Other
|
|
}
|
|
}
|
|
|
|
pub fn configure_portal_for_env(env: DesktopEnvironment) -> PortalConfig {
|
|
match env {
|
|
DesktopEnvironment::GNOME => PortalConfig::gnome(),
|
|
DesktopEnvironment::KDE => PortalConfig::kde(),
|
|
DesktopEnvironment::Other => PortalConfig::generic(),
|
|
}
|
|
}
|
|
```
|
|
|
|
### 9. Debugging and Troubleshooting
|
|
|
|
**Challenge:** Debugging complex pipeline with multiple components
|
|
|
|
**Solution:**
|
|
- Comprehensive logging
|
|
- Metrics collection
|
|
- Debug mode with frame inspection
|
|
|
|
```rust
|
|
pub struct DebugLogger {
|
|
enabled: bool,
|
|
output: DebugOutput,
|
|
}
|
|
|
|
pub enum DebugOutput {
|
|
Console,
|
|
File(PathBuf),
|
|
Both,
|
|
}
|
|
|
|
impl DebugLogger {
|
|
pub fn log_frame(&self, frame: &CapturedFrame) {
|
|
if !self.enabled {
|
|
return;
|
|
}
|
|
|
|
tracing::debug!(
|
|
"Frame: {}x{}, format: {:?}, timestamp: {}",
|
|
frame.width,
|
|
frame.height,
|
|
frame.format,
|
|
frame.timestamp
|
|
);
|
|
}
|
|
|
|
pub fn log_encoding(&self, encoded: &EncodedFrame) {
|
|
if !self.enabled {
|
|
return;
|
|
}
|
|
|
|
tracing::debug!(
|
|
"Encoded: {} bytes, keyframe: {}, seq: {}",
|
|
encoded.data.len(),
|
|
encoded.is_keyframe,
|
|
encoded.sequence_number
|
|
);
|
|
}
|
|
}
|
|
```
|
|
|
|
### 10. Resource Limits
|
|
|
|
**Challenge:** Prevent resource exhaustion (CPU, memory, GPU)
|
|
|
|
**Solution:**
|
|
- Limit concurrent sessions
|
|
- Monitor resource usage
|
|
- Implement graceful degradation
|
|
|
|
```rust
|
|
pub struct ResourceManager {
|
|
max_sessions: usize,
|
|
active_sessions: Arc<Mutex<HashSet<String>>>,
|
|
cpu_threshold: f32,
|
|
memory_threshold: u64,
|
|
}
|
|
|
|
impl ResourceManager {
|
|
pub async fn can_create_session(&self) -> bool {
|
|
let sessions = self.active_sessions.lock().await;
|
|
|
|
if sessions.len() >= self.max_sessions {
|
|
return false;
|
|
}
|
|
|
|
if self.cpu_usage() > self.cpu_threshold {
|
|
return false;
|
|
}
|
|
|
|
if self.memory_usage() > self.memory_threshold {
|
|
return false;
|
|
}
|
|
|
|
true
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Code Examples
|
|
|
|
### Main Application Entry Point
|
|
|
|
```rust
|
|
// src/main.rs
|
|
mod capture;
|
|
mod encoder;
|
|
mod webrtc;
|
|
mod buffer;
|
|
mod ipc;
|
|
mod config;
|
|
|
|
use anyhow::Result;
|
|
use tracing::{info, error};
|
|
use tracing_subscriber;
|
|
|
|
#[tokio::main]
|
|
async fn main() -> Result<()> {
|
|
// Initialize logging
|
|
tracing_subscriber::fmt()
|
|
.with_max_level(tracing::Level::INFO)
|
|
.init();
|
|
|
|
// Load configuration
|
|
let config = config::load_config("config.toml")?;
|
|
|
|
info!("Starting Wayland WebRTC Backend");
|
|
info!("Configuration: {:?}", config);
|
|
|
|
// Initialize components
|
|
let capture = capture::WaylandCapture::new(config.capture).await?;
|
|
let encoder = encoder::create_encoder(config.encoder)?;
|
|
let webrtc = webrtc::WebRtcServer::new(config.webrtc).await?;
|
|
|
|
// Create video track
|
|
let video_track = webrtc::create_video_track()?;
|
|
|
|
// Run capture pipeline
|
|
let session_id = uuid::Uuid::new_v4().to_string();
|
|
webrtc.create_peer_connection(session_id.clone(), video_track).await?;
|
|
|
|
// Main loop
|
|
loop {
|
|
match capture.next_frame().await {
|
|
Ok(frame) => {
|
|
match encoder.encode(frame).await {
|
|
Ok(encoded) => {
|
|
if let Err(e) = webrtc.send_video_frame(&session_id, encoded).await {
|
|
error!("Failed to send frame: {}", e);
|
|
}
|
|
}
|
|
Err(e) => {
|
|
error!("Failed to encode frame: {}", e);
|
|
}
|
|
}
|
|
}
|
|
Err(e) => {
|
|
error!("Failed to capture frame: {}", e);
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Configuration Example
|
|
|
|
```toml
|
|
# config.toml - Low Latency Configuration
|
|
|
|
[server]
|
|
address = "0.0.0.0:8080"
|
|
max_sessions = 10
|
|
log_level = "info"
|
|
|
|
[capture]
|
|
frame_rate = 60
|
|
quality = "high"
|
|
screen_region = null
|
|
# screen_region = { x = 0, y = 0, width = 1920, height = 1080 }
|
|
|
|
# Low-latency capture settings
|
|
zero_copy = true # Always use DMA-BUF zero-copy
|
|
track_damage = true # Enable damage tracking
|
|
partial_updates = true # Encode only damaged regions
|
|
buffer_pool_size = 3 # Small buffer pool for low latency
|
|
|
|
[encoder]
|
|
type = "auto"
|
|
# Options: "vaapi", "nvenc", "x264", "auto"
|
|
bitrate = 4000000
|
|
keyframe_interval = 15 # Short GOP for low latency
|
|
preset = "ultrafast" # Ultrafast for minimal latency
|
|
|
|
# Low-latency specific settings
|
|
[encoder.low_latency]
|
|
gop_size = 15 # Short GOP
|
|
b_frames = 0 # No B-frames for low latency
|
|
lookahead = 0 # Minimal lookahead
|
|
rc_mode = "CBR" # Constant bitrate for predictable latency
|
|
vbv_buffer_size = 4000000 # 1 second VBV buffer
|
|
max_bitrate = 4000000 # Tight max constraint
|
|
intra_period = 15 # Keyframe interval
|
|
|
|
# Quality vs latency presets
|
|
[encoder.quality_presets]
|
|
low = { bitrate = 1000000, preset = "ultrafast", gop_size = 30 }
|
|
medium = { bitrate = 2000000, preset = "ultrafast", gop_size = 20 }
|
|
high = { bitrate = 4000000, preset = "ultrafast", gop_size = 15 }
|
|
ultra = { bitrate = 8000000, preset = "veryfast", gop_size = 15 }
|
|
|
|
[webrtc]
|
|
stun_servers = ["stun:stun.l.google.com:19302"]
|
|
turn_servers = []
|
|
|
|
[webrtc.low_latency]
|
|
# Critical: Minimal playout delay
|
|
playout_delay_min_ms = 0 # No minimum delay
|
|
playout_delay_max_ms = 20 # 20ms maximum delay
|
|
|
|
# Bitrate settings
|
|
max_bitrate = 8000000 # 8 Mbps max
|
|
min_bitrate = 500000 # 500 Kbps min
|
|
start_bitrate = 4000000 # 4 Mbps start
|
|
|
|
# Packetization
|
|
rtp_payload_size = 1200 # Smaller packets for lower latency
|
|
packetization_mode = "non_interleaved"
|
|
|
|
# Retransmission settings
|
|
nack_enabled = true # Enable NACK
|
|
nack_window_size_ms = 20 # Only request retransmission for recent packets
|
|
max_nack_packets_per_second = 50
|
|
fec_enabled = false # Disable FEC for low latency
|
|
|
|
# Congestion control
|
|
transport_cc_enabled = true # Enable transport-wide congestion control
|
|
|
|
# RTCP settings
|
|
rtcp_report_interval_ms = 50 # Frequent RTCP feedback
|
|
|
|
[webrtc.ice]
|
|
transport_policy = "all"
|
|
candidate_network_types = ["udp", "tcp"]
|
|
|
|
[buffer]
|
|
# Minimal buffering for low latency
|
|
dma_buf_pool_size = 3 # Small pool
|
|
encoded_buffer_pool_size = 5 # Very small pool
|
|
sender_buffer_max_frames = 1 # Single frame sender buffer
|
|
jitter_buffer_target_delay_ms = 5 # 5ms target jitter buffer
|
|
jitter_buffer_max_delay_ms = 10 # 10ms maximum jitter buffer
|
|
|
|
[latency]
|
|
# Latency targets
|
|
target_latency_ms = 25 # Target end-to-end latency
|
|
max_acceptable_latency_ms = 50 # Maximum acceptable latency
|
|
|
|
# Adaptive settings
|
|
adaptive_frame_rate = true
|
|
adaptive_bitrate = true
|
|
fast_frame_drop = true
|
|
|
|
# Frame dropping strategy
|
|
consecutive_drop_limit = 3 # Max consecutive drops
|
|
drop_threshold_ms = 5 # Drop if queue latency exceeds this
|
|
|
|
[monitoring]
|
|
enable_metrics = true
|
|
enable_latency_tracking = true
|
|
metrics_port = 9090
|
|
```
|
|
|
|
---
|
|
|
|
## Performance Targets
|
|
|
|
### Latency Targets
|
|
|
|
#### Local Network (LAN)
|
|
| Scenario | Target | Acceptable |
|
|
|----------|--------|------------|
|
|
| Core Target | 25-35ms | 15-25ms (Excellent) |
|
|
| Minimum | <50ms | 15-25ms (Excellent) |
|
|
| User Experience | <16ms: Imperceptible | 16-33ms: Very Smooth |
|
|
| User Experience | 33-50ms: Good | 50-100ms: Acceptable |
|
|
|
|
#### Internet
|
|
| Scenario | Target | Acceptable |
|
|
|----------|--------|------------|
|
|
| Excellent | 40-60ms | <80ms |
|
|
| Good | 60-80ms | <100ms |
|
|
|
|
### Performance Metrics by Phase
|
|
|
|
| Metric | MVP | Phase 2 | Phase 3 | Phase 4 |
|
|
|--------|-----|---------|---------|---------|
|
|
| FPS (LAN) | 30 | 60 | 60 | 60 |
|
|
| FPS (Internet) | 15-20 | 30 | 30-60 | 60 |
|
|
| Resolution | 720p | 1080p | 1080p/4K | 1080p/4K |
|
|
| Latency (LAN) | <100ms | <50ms | 25-35ms | 15-25ms |
|
|
| Latency (Internet) | <200ms | <100ms | 60-80ms | 40-60ms |
|
|
| CPU Usage | 20-30% | 10-15% | 5-10% | <5% |
|
|
| Memory Usage | 150MB | 250MB | 400MB | <400MB |
|
|
| Bitrate | 2-4 Mbps | 4-8 Mbps | Adaptive | Adaptive |
|
|
| Concurrent Sessions | 1 | 3-5 | 5-10 | 10+ |
|
|
|
|
### Latency Budget Allocation (15-25ms Target)
|
|
|
|
| Component | Time (ms) | Percentage | Optimization Strategy |
|
|
|-----------|-----------|------------|---------------------|
|
|
| Wayland Capture | 2-3 | 12-15% | DMA-BUF zero-copy, partial update |
|
|
| Encoder | 3-5 | 20-25% | Hardware encoder, no B-frames |
|
|
| Packetization | 1-2 | 6-10% | Inline RTP, minimal buffering |
|
|
| Network (LAN) | 0.5-1 | 3-5% | UDP direct path, kernel bypass |
|
|
| Jitter Buffer | 0-2 | 0-10% | Minimal buffer, predictive jitter |
|
|
| Decoder | 1-2 | 6-10% | Hardware acceleration |
|
|
| Display | 1-2 | 6-10% | vsync bypass, direct scanout |
|
|
| **Total** | **15-25** | **100%** | |
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
This design provides a comprehensive blueprint for building an ultra-low-latency Wayland → WebRTC remote desktop backend in Rust. Key highlights:
|
|
|
|
1. **Zero-Copy Architecture**: Minimizes CPU copies through DMA-BUF and reference-counted buffers, achieving <5ms copy overhead
|
|
2. **Hardware Acceleration**: VA-API/NVENC encoders configured for <5ms encoding latency
|
|
3. **Minimal Buffering**: Single-frame sender buffer and 0-10ms jitter buffer throughout pipeline
|
|
4. **Low-Latency WebRTC**: Custom configuration with 0-20ms playout delay, no FEC, limited NACK
|
|
5. **Performance**: Targets 15-25ms latency on local networks at 60 FPS
|
|
6. **Adaptive Quality**: Dynamic frame rate (30-60fps) and bitrate adjustment based on network conditions
|
|
7. **Damage Tracking**: Partial region updates for static content to reduce encoding load
|
|
|
|
**Latency Budget Breakdown (15-25ms target):**
|
|
- Capture: 2-3ms (DMA-BUF zero-copy)
|
|
- Encoder: 3-5ms (hardware, no B-frames)
|
|
- Packetization: 1-2ms (inline RTP)
|
|
- Network: 0.5-1ms (LAN)
|
|
- Jitter Buffer: 0-2ms (minimal)
|
|
- Decoder: 1-2ms (hardware)
|
|
- Display: 1-2ms (vsync bypass)
|
|
|
|
The phased implementation approach allows for incremental development and testing:
|
|
- **Phase 1 (4-6 weeks)**: MVP with <100ms latency
|
|
- **Phase 2 (3-4 weeks)**: Hardware acceleration, <50ms latency
|
|
- **Phase 3 (4-5 weeks)**: Low-latency optimizations, 25-35ms latency
|
|
- **Phase 4 (5-7 weeks)**: Ultra-low latency tuning, 15-25ms latency
|
|
|
|
Critical P0 optimizations for achieving 15-25ms latency:
|
|
1. Hardware encoder with zero B-frames, 15-frame GOP
|
|
2. DMA-BUF zero-copy capture pipeline
|
|
3. Minimal buffering (1 frame sender, 0-10ms jitter)
|
|
4. WebRTC low-latency configuration (0-20ms playout delay)
|
|
|
|
---
|
|
|
|
## Additional Resources
|
|
|
|
### Wayland & PipeWire
|
|
- [Wayland Protocol](https://wayland.freedesktop.org/docs/html/)
|
|
- [PipeWire Documentation](https://docs.pipewire.org/)
|
|
- [xdg-desktop-portal](https://flatpak.github.io/xdg-desktop-portal/)
|
|
|
|
### WebRTC
|
|
- [WebRTC Specifications](https://www.w3.org/TR/webrtc/)
|
|
- [webrtc-rs](https://github.com/webrtc-rs/webrtc)
|
|
- [WebRTC for the Curious](https://webrtcforthecurious.com/)
|
|
|
|
### Video Encoding
|
|
- [VA-API](https://github.com/intel/libva)
|
|
- [NVENC](https://developer.nvidia.com/nvidia-video-codec-sdk)
|
|
- [x264](https://www.videolan.org/developers/x264.html)
|
|
|
|
### Rust
|
|
- [Tokio](https://tokio.rs/)
|
|
- [Bytes](https://docs.rs/bytes/)
|
|
- [Async Rust Book](https://rust-lang.github.io/async-book/)
|