Files
wl-webrtc/.sisyphus/BUGS.md

15 KiB

wl-webrtc Bug Analysis Report

Generated by Oracle audit on 2026-05-22. Total: 25 bugs.

Severity Legend

  • 🔴 Critical — Crash, UB, or data corruption
  • 🟠 High — Incorrect behavior, resource leak under normal usage
  • 🟡 Medium — Wasted resources, logic errors with limited impact
  • 🔵 Low-Medium — Fragile patterns, may cause issues under specific conditions
  • Low — Input validation, code smell, future-proofing

🔴 Critical (1)

#1: PipeWire helper thread Use-After-Free on mainloop quit Fixed

  • Location: src/cap_portal.rs:348-357, 359-363
  • Description: mainloop.as_raw_ptr() is cast to usize, an untracked thread is spawned that calls pw_main_loop_quit through that raw pointer. If mainloop.run() returns or the PipeWire thread exits before CapPortal::drop sends shutdown, the helper can outlive mainloop and call into freed memory.
  • Fix: Remove the raw-pointer helper thread. Use a PipeWire loop event/eventfd/timer registered on the PipeWire loop, or store and join the helper thread with a lifetime guarantee that it exits before mainloop is dropped.
  • Fixed in: e40ef9e — Replaced detached helper thread + raw pointer with eventfd registered via add_io() on the PipeWire loop. Shutdown callback runs on the loop thread during mainloop.run(), guaranteeing mainloop is alive. Drop order ensures IO source is unregistered before mainloop is destroyed.

🟠 High (4)

#2: PipeWire process callback can block indefinitely Fixed

  • Location: src/cap_portal.rs:320
  • Description: frame_tx.send(PwEvent::Frame(frame)) is called from the PipeWire .process callback on a bounded channel of size 3. If encoding stalls, this blocks the PipeWire callback and delays stream.queue_raw_buffer(raw_buf).
  • Fix: Use try_send and drop frames on full, or use an unbounded channel with a backpressure counter.
  • Fixed in: a09a423 — Replaced send() with try_send() + AtomicU64 drop counter. PipeWire .process callback is now guaranteed non-blocking. Dropped frames are counted and logged every 30 occurrences.

#3: Portal PTS uses raw PipeWire timestamp as encoder frame-number timebase Fixed

  • Location: src/state_portal.rs:177, src/cap_portal.rs:279
  • Description: PipeWire PTS (from spa_meta_header.pts) is in nanoseconds (CLOCK_MONOTONIC), not frame-count units. It was assigned directly to AVFrame.pts while the encoder time_base is 1/fps, causing the muxer to interpret timestamps as billions of frames and producing corrupted duration metadata.
  • Fix: Record the first frame's PipeWire PTS as a nanosecond base, compute elapsed nanoseconds for each subsequent frame, then convert to frame numbers via elapsed_ns * fps / 1_000_000_000. Using elapsed time avoids i64 overflow on absolute timestamps. Invalid/negative PTS values are clamped to 0.
  • Fixed in: ffb36b7 — Added first_pts_ns: Option<i64> to StatePortal. PTS conversion matches WLR path pattern (state.rs:525-527).

#4: AVDRMFrameDescriptor leaks if Portal encode fails Fixed

  • Location: src/state_portal.rs:183
  • Description: The descriptor is allocated with Box::into_raw at line 120 and manually recovered at lines 187-191. But enc.encode_frame(&hw_frame)? returns early on error before recovery. This leaks the boxed AVDRMFrameDescriptor.
  • Fix: Wrap the descriptor pointer in a small RAII guard that reclaims the box on every return path, or recover the box before propagating the encode error.
  • Fixed in: 2d448dc — Moved Box::from_raw descriptor recovery from after encode_frame to before it. Since av_hwframe_transfer_data has already imported the DMA-BUF into the VAAPI surface by that point, the descriptor struct is safe to reclaim. Now encode_frame's ? early-return can no longer leak the descriptor.

#5: Portal can initialize encoder with zero or unknown format Fixed

  • Location: src/cap_portal.rs:306, src/state_portal.rs:68
  • Description: format_info.get().unwrap_or((0, 0, 0, 0)) permits sending a frame with width, height, and format set to zero before format negotiation has completed. StatePortal treats the first frame as authoritative and creates the encoder from those dimensions.
  • Fix: Do not emit PwEvent::Frame until format info is present and spa_to_drm_fourcc returns nonzero; otherwise requeue or drop the buffer.
  • Fixed in: b9e62d6 — Replaced unwrap_or((0,0,0,0)) with format_info.get() Some/None check plus explicit width/height/format == 0 guard, placed before libc::dup(fd) to avoid wasting a fd on dropped frames. Also partially fixes #15 (unsupported SPA format → drm_fourcc=0 is now rejected at source).

🟡 Medium (4)

#6: WLR timestamp offset is applied after rescaling using wrong units

  • Location: src/state.rs:525, src/avhw.rs:446
  • Description: WLR computes absolute PTS in encoder ticks, then drain_encoder rescales packet timestamps and subtracts start_ts afterward. If the muxer stream timebase changes after avformat_write_header, subtracting an encoder-timebase start_ts from stream-timebase packet timestamps is wrong.
  • Fix: Subtract the first frame PTS before sending frames to the encoder/filter, or rescale start_ts with the same source and destination timebases before subtracting.

#7: FFmpeg format context and IO leak on EncState::new error paths

  • Location: src/avhw.rs:256-323
  • Description: EncState::new allocates AVFormatContext with avformat_alloc_output_context2. After that, several bail! calls (lines 279, 285, 293, 319) can return before the raw pointer is wrapped by Output::wrap at line 323. This leaks fmt_ctx_ptr and, after avio_open, the opened AVIOContext.
  • Fix: Introduce a local RAII guard for AVFormatContext immediately after allocation. The guard calls avio_closep if pb is non-null and avformat_free_context unless ownership has been transferred. Disarm after wrap succeeds.

#8: FPS limiting happens after expensive WLR capture/allocation

  • Location: src/state.rs:383-549
  • Description: In the WLR path, queue_alloc_frame starts a screencopy request, on_frame_allocd allocates VAAPI surface, maps to DRM PRIME, duplicates DMA-BUF fds, creates Wayland buffer, and queues compositor copy. Only after the compositor signals ready does on_copy_complete apply FPS limiting. Frames that will be dropped still consume all these resources.
  • Fix: Move FPS gating before manager.capture_output in queue_alloc_frame. Track last_capture_requested or last_frame_emitted as an Instant, skip queueing if target interval has not elapsed.

#9: FpsLimit drops the second frame after every first encoded frame

  • Location: src/fps_limit.rs:19, src/state.rs:542, src/state_portal.rs:103
  • Description: Both state machines bypass FpsLimit for the first frame. Because that first frame is never fed into FpsLimit, the next frame calls on_new_frame with empty on_deck; it returns None, so the second frame is always dropped regardless of elapsed time.
  • Fix: Replace FpsLimit<T> with a timestamp-only limiter that stores last_emitted: Option<Instant> and returns a boolean. On the first encoded frame, set last_emitted = Some(now).

#10: Intermediate stage left installed on fatal initialization errors

  • Location: src/state.rs:582, 724
  • Description: negotiate_format and try_finalize_output use mem::replace(&mut self.stage, EncConstructionStage::Intermediate). If create_encoder fails or managers are missing, self.errored is set but self.stage remains Intermediate. Resources moved out are dropped implicitly, and future retry logic would operate on a meaningless state.
  • Fix: Replace raw mem::replace transitions with a helper that has explicit commit/rollback semantics. Build all fallible resources before moving the stage. When unavoidable, restore the old stage on failure or install a dedicated Failed stage.

🔵 Low-Medium (5)

#11: av_buffersrc_parameters_set failure can leak cloned hw_frames_ctx

  • Location: src/avhw.rs:538-557
  • Description: frames_rgb.ref_clone() is assigned to par.hw_frames_ctx. If av_buffersrc_parameters_set fails, the cloned AVBufferRef may not have transferred ownership, and the code does not call av_buffer_unref on it.
  • Fix: Store the cloned reference in a local mutable pointer. On failure, call av_buffer_unref(&mut cloned_ref).

#12: Portal session lifetime may be too short

  • Location: src/cap_portal.rs:83-134
  • Description: setup_portal creates an ashpd Screencast proxy and session, returns only (OwnedFd, node_id), and drops both proxy and session when it returns. If ashpd closes the screencast session on drop, capture may end immediately.
  • Fix: Verify ashpd session drop semantics. If dropping the session closes the portal session, store the session handle inside CapPortal for the entire capture lifetime.

#13: Portal descriptor uses borrowed DMA-BUF fd without clear lifetime contract across FFmpeg import

  • Location: src/state_portal.rs:217
  • Description: build_drm_descriptor stores frame.fd.as_raw_fd() into desc.objects[0].fd. The fd is owned by PwDmaBufFrame. If FFmpeg/VAAPI retains the fd beyond av_hwframe_transfer_data, PwDmaBufFrame dropping at end of handle_pw_frame can close an in-use fd.
  • Fix: Duplicate the fd for the descriptor if FFmpeg import is not guaranteed synchronous. Create an RAII owner that contains both the boxed descriptor and OwnedFd duplicates.

#14: WLR duplicate DMA-BUF fds passed to Wayland rely on subtle OwnedFd::as_fd ownership

  • Location: src/state.rs:463-482
  • Description: on_frame_allocd dups obj.fd, wraps with OwnedFd::from_raw_fd, and passes fd_owned.as_fd() into params.add. The comment says params.add() takes ownership, but code passes a BorrowedFd. fd_owned is dropped at end of loop iteration.
  • Fix: Confirm the generated signature for ZwpLinuxBufferParamsV1::add. If ownership transfer is required, use the binding's owned-fd API or transfer with into_raw_fd.

#15 (moved to Low): spa_to_drm_fourcc returns 0, downstream treats as usable

  • Location: src/cap_portal.rs:370-381
  • Description: Unsupported SPA video formats map to 0. This value is stored in format_info and later used in AVDRMFrameDescriptor.layers[0].format with no rejection path.
  • Fix: Make spa_to_drm_fourcc return Option<u32>. On None, send a fatal PwEvent::Error or renegotiate.

Low (8)

#15: --fps 0 causes panic

  • Location: src/fps_limit.rs:12, src/avhw.rs:217
  • Description: 1.0 / 0.0u32 produces infinite duration. Encoder timebase Rational::new(1, 0) is also invalid.
  • Fix: Validate args.fps > 0 after parsing. Use a clap value parser.

#16: to_str().unwrap() panics on non-UTF-8 paths

  • Location: src/avhw.rs:24, 255
  • Description: drm_device.to_str().unwrap() and output_path.to_str().unwrap() panic on non-UTF-8 paths.
  • Fix: Replace with path.to_str().ok_or_else(|| anyhow!("path is not valid UTF-8: {}", path.display()))?.

#17: Portal shutdown swallows encoder errors

  • Location: src/state_portal.rs:198-209, src/main.rs:227
  • Description: StatePortal::shutdown() catches and logs flush errors but returns (). run_portal_pipewire always returns Ok(()) even if trailer writing failed.
  • Fix: Change shutdown to return Result<()>. Propagate in run_portal_pipewire.

#18: WLR shutdown discards buffered FPS-limiter frame

  • Location: src/main.rs:167, src/state.rs:546
  • Description: state.fps_limit.flush() return value is ignored. The limiter stores S::Frame::default() (not real frame data), so no actual frame is lost, but the design is misleading.
  • Fix: Use timestamp-only limiter (see #9). If keeping FpsLimit, log the discarded flush value.

#19: StatePortal drops first PipeWire frame after format negotiation

  • Location: src/state_portal.rs:56-82
  • Description: First PwEvent::Frame is used only for encoder creation, then drop(frame). The first actual image is not encoded, creating a startup gap and interacting with FPS limiter issue #9.
  • Fix: After encoder creation, pass the same frame to handle_pw_frame(frame) instead of dropping. Or set FPS limiter baseline correctly.

#20: CapPortal::Drop can block indefinitely on join

  • Location: src/cap_portal.rs:137-144
  • Description: Drop sends PwCmd::Shutdown then unconditionally join()s. If the command channel send fails or pw_main_loop_quit doesn't wake the loop, join blocks forever.
  • Fix: Replace helper thread with PW-loop-integrated wakeup. If keeping thread design, add a bounded shutdown timeout.

#21: pw::deinit() called from worker thread with process-wide implications

  • Location: src/cap_portal.rs:155, 361
  • Description: pw::init()/pw::deinit() are process-global. Deinit inside a worker thread is fragile if future code adds another PipeWire user.
  • Fix: Treat PW init as process-level lifecycle. Init once at startup, deinit once at shutdown, or omit explicit deinit.

#22: PipeWire callback assumes first data item contains complete DMA-BUF image

  • Location: src/cap_portal.rs:260
  • Description: Only reads datas[0]. Multi-plane formats (NV12 etc.) would produce corrupted frames.
  • Fix: Validate that negotiated DRM format is single-plane RGB/XRGB before emitting frames.

#24: CapPortal stores Tokio runtime but not portal session objects

  • Location: src/cap_portal.rs:31
  • Description: rt: Runtime is stored but not used after setup_portal. The runtime does not keep ashpd proxy/session alive.
  • Fix: Remove rt if not needed, or store actual session/proxy handles alongside it.

#25: poll_and_encode collapses all channel errors into "no event"

  • Location: src/state_portal.rs:49-53
  • Description: try_recv errors are all mapped to Ok(false). A disconnected channel (PipeWire thread crashed) is indistinguishable from an empty channel.
  • Fix: Match on RecvError variant. On Disconnected, set self.errored = true.

Summary

Severity Count
Critical 1
High 4
Medium 5
Low-Medium 5
Low 10
Total 25
  1. #1 (Critical UAF) — PipeWire mainloop shutdown race
  2. #4 (High leak) — AVDRMFrameDescriptor leak on encode error
  3. #2 (High blocking) — PipeWire process callback blocking
  4. #5 (High validation) — Zero-format encoder init
  5. #3 (High PTS) — Portal PTS unit mismatch