diff --git a/.sisyphus/BUGS.md b/.sisyphus/BUGS.md new file mode 100644 index 0000000..3cd6c4d --- /dev/null +++ b/.sisyphus/BUGS.md @@ -0,0 +1,203 @@ +# wl-webrtc Bug Analysis Report + +> Generated by Oracle audit on 2026-05-22. Total: 25 bugs. + +## Severity Legend + +- 🔴 **Critical** — Crash, UB, or data corruption +- 🟠 **High** — Incorrect behavior, resource leak under normal usage +- 🟡 **Medium** — Wasted resources, logic errors with limited impact +- 🔵 **Low-Medium** — Fragile patterns, may cause issues under specific conditions +- ⚪ **Low** — Input validation, code smell, future-proofing + +--- + +## 🔴 Critical (1) + +### ~~#1: PipeWire helper thread Use-After-Free on mainloop quit~~ ✅ Fixed + +- **Location**: `src/cap_portal.rs:348-357, 359-363` +- **Description**: `mainloop.as_raw_ptr()` is cast to `usize`, an untracked thread is spawned that calls `pw_main_loop_quit` through that raw pointer. If `mainloop.run()` returns or the PipeWire thread exits before `CapPortal::drop` sends shutdown, the helper can outlive `mainloop` and call into freed memory. +- **Fix**: Remove the raw-pointer helper thread. Use a PipeWire loop event/eventfd/timer registered on the PipeWire loop, or store and join the helper thread with a lifetime guarantee that it exits before `mainloop` is dropped. +- **Fixed in**: [`e40ef9e`](commit:e40ef9e) — Replaced detached helper thread + raw pointer with eventfd registered via `add_io()` on the PipeWire loop. Shutdown callback runs on the loop thread during `mainloop.run()`, guaranteeing mainloop is alive. Drop order ensures IO source is unregistered before mainloop is destroyed. + +--- + +## 🟠 High (4) + +### #2: PipeWire process callback can block indefinitely + +- **Location**: `src/cap_portal.rs:320` +- **Description**: `frame_tx.send(PwEvent::Frame(frame))` is called from the PipeWire `.process` callback on a bounded channel of size 3. If encoding stalls, this blocks the PipeWire callback and delays `stream.queue_raw_buffer(raw_buf)`. +- **Fix**: Use `try_send` and drop frames on full, or use an unbounded channel with a backpressure counter. + +### #3: Portal PTS uses raw PipeWire timestamp as encoder frame-number timebase + +- **Location**: `src/state_portal.rs:177`, `src/cap_portal.rs:279` +- **Description**: PipeWire PTS (from `spa_meta_header.pts`) is not guaranteed to be in frame-count units — it may be nanoseconds or an invalid sentinel. It is assigned directly to `AVFrame.pts` while the encoder time_base is `1/fps`. +- **Fix**: Normalize timestamps at the Portal boundary: store the first valid PipeWire PTS, convert elapsed time to encoder ticks with `elapsed_ns * fps / 1_000_000_000`, and synthesize monotonic frame numbers when PTS is invalid. + +### #4: AVDRMFrameDescriptor leaks if Portal encode fails + +- **Location**: `src/state_portal.rs:183` +- **Description**: The descriptor is allocated with `Box::into_raw` at line 120 and manually recovered at lines 187-191. But `enc.encode_frame(&hw_frame)?` returns early on error before recovery. This leaks the boxed `AVDRMFrameDescriptor`. +- **Fix**: Wrap the descriptor pointer in a small RAII guard that reclaims the box on every return path, or recover the box before propagating the encode error. + +### #5: Portal can initialize encoder with zero or unknown format + +- **Location**: `src/cap_portal.rs:306`, `src/state_portal.rs:68` +- **Description**: `format_info.get().unwrap_or((0, 0, 0, 0))` permits sending a frame with width, height, and format set to zero before format negotiation has completed. `StatePortal` treats the first frame as authoritative and creates the encoder from those dimensions. +- **Fix**: Do not emit `PwEvent::Frame` until format info is present and `spa_to_drm_fourcc` returns nonzero; otherwise requeue or drop the buffer. + +--- + +## 🟡 Medium (4) + +### #6: WLR timestamp offset is applied after rescaling using wrong units + +- **Location**: `src/state.rs:525`, `src/avhw.rs:446` +- **Description**: WLR computes absolute PTS in encoder ticks, then `drain_encoder` rescales packet timestamps and subtracts `start_ts` afterward. If the muxer stream timebase changes after `avformat_write_header`, subtracting an encoder-timebase `start_ts` from stream-timebase packet timestamps is wrong. +- **Fix**: Subtract the first frame PTS before sending frames to the encoder/filter, or rescale `start_ts` with the same source and destination timebases before subtracting. + +### #7: FFmpeg format context and IO leak on EncState::new error paths + +- **Location**: `src/avhw.rs:256-323` +- **Description**: `EncState::new` allocates `AVFormatContext` with `avformat_alloc_output_context2`. After that, several `bail!` calls (lines 279, 285, 293, 319) can return before the raw pointer is wrapped by `Output::wrap` at line 323. This leaks `fmt_ctx_ptr` and, after `avio_open`, the opened `AVIOContext`. +- **Fix**: Introduce a local RAII guard for `AVFormatContext` immediately after allocation. The guard calls `avio_closep` if `pb` is non-null and `avformat_free_context` unless ownership has been transferred. Disarm after `wrap` succeeds. + +### #8: FPS limiting happens after expensive WLR capture/allocation + +- **Location**: `src/state.rs:383-549` +- **Description**: In the WLR path, `queue_alloc_frame` starts a screencopy request, `on_frame_allocd` allocates VAAPI surface, maps to DRM PRIME, duplicates DMA-BUF fds, creates Wayland buffer, and queues compositor copy. Only after the compositor signals ready does `on_copy_complete` apply FPS limiting. Frames that will be dropped still consume all these resources. +- **Fix**: Move FPS gating before `manager.capture_output` in `queue_alloc_frame`. Track `last_capture_requested` or `last_frame_emitted` as an `Instant`, skip queueing if target interval has not elapsed. + +### #9: FpsLimit drops the second frame after every first encoded frame + +- **Location**: `src/fps_limit.rs:19`, `src/state.rs:542`, `src/state_portal.rs:103` +- **Description**: Both state machines bypass `FpsLimit` for the first frame. Because that first frame is never fed into `FpsLimit`, the next frame calls `on_new_frame` with empty `on_deck`; it returns `None`, so the second frame is always dropped regardless of elapsed time. +- **Fix**: Replace `FpsLimit` with a timestamp-only limiter that stores `last_emitted: Option` and returns a boolean. On the first encoded frame, set `last_emitted = Some(now)`. + +### #10: Intermediate stage left installed on fatal initialization errors + +- **Location**: `src/state.rs:582, 724` +- **Description**: `negotiate_format` and `try_finalize_output` use `mem::replace(&mut self.stage, EncConstructionStage::Intermediate)`. If `create_encoder` fails or managers are missing, `self.errored` is set but `self.stage` remains `Intermediate`. Resources moved out are dropped implicitly, and future retry logic would operate on a meaningless state. +- **Fix**: Replace raw `mem::replace` transitions with a helper that has explicit commit/rollback semantics. Build all fallible resources before moving the stage. When unavoidable, restore the old stage on failure or install a dedicated `Failed` stage. + +--- + +## 🔵 Low-Medium (5) + +### #11: av_buffersrc_parameters_set failure can leak cloned hw_frames_ctx + +- **Location**: `src/avhw.rs:538-557` +- **Description**: `frames_rgb.ref_clone()` is assigned to `par.hw_frames_ctx`. If `av_buffersrc_parameters_set` fails, the cloned `AVBufferRef` may not have transferred ownership, and the code does not call `av_buffer_unref` on it. +- **Fix**: Store the cloned reference in a local mutable pointer. On failure, call `av_buffer_unref(&mut cloned_ref)`. + +### #12: Portal session lifetime may be too short + +- **Location**: `src/cap_portal.rs:83-134` +- **Description**: `setup_portal` creates an ashpd Screencast proxy and session, returns only `(OwnedFd, node_id)`, and drops both proxy and session when it returns. If ashpd closes the screencast session on drop, capture may end immediately. +- **Fix**: Verify ashpd session drop semantics. If dropping the session closes the portal session, store the session handle inside `CapPortal` for the entire capture lifetime. + +### #13: Portal descriptor uses borrowed DMA-BUF fd without clear lifetime contract across FFmpeg import + +- **Location**: `src/state_portal.rs:217` +- **Description**: `build_drm_descriptor` stores `frame.fd.as_raw_fd()` into `desc.objects[0].fd`. The fd is owned by `PwDmaBufFrame`. If FFmpeg/VAAPI retains the fd beyond `av_hwframe_transfer_data`, `PwDmaBufFrame` dropping at end of `handle_pw_frame` can close an in-use fd. +- **Fix**: Duplicate the fd for the descriptor if FFmpeg import is not guaranteed synchronous. Create an RAII owner that contains both the boxed descriptor and `OwnedFd` duplicates. + +### #14: WLR duplicate DMA-BUF fds passed to Wayland rely on subtle OwnedFd::as_fd ownership + +- **Location**: `src/state.rs:463-482` +- **Description**: `on_frame_allocd` dups `obj.fd`, wraps with `OwnedFd::from_raw_fd`, and passes `fd_owned.as_fd()` into `params.add`. The comment says `params.add()` takes ownership, but code passes a `BorrowedFd`. `fd_owned` is dropped at end of loop iteration. +- **Fix**: Confirm the generated signature for `ZwpLinuxBufferParamsV1::add`. If ownership transfer is required, use the binding's owned-fd API or transfer with `into_raw_fd`. + +### #15 (moved to Low): spa_to_drm_fourcc returns 0, downstream treats as usable + +- **Location**: `src/cap_portal.rs:370-381` +- **Description**: Unsupported SPA video formats map to `0`. This value is stored in `format_info` and later used in `AVDRMFrameDescriptor.layers[0].format` with no rejection path. +- **Fix**: Make `spa_to_drm_fourcc` return `Option`. On `None`, send a fatal `PwEvent::Error` or renegotiate. + +--- + +## ⚪ Low (8) + +### #15: --fps 0 causes panic + +- **Location**: `src/fps_limit.rs:12`, `src/avhw.rs:217` +- **Description**: `1.0 / 0.0u32` produces infinite duration. Encoder timebase `Rational::new(1, 0)` is also invalid. +- **Fix**: Validate `args.fps > 0` after parsing. Use a clap value parser. + +### #16: to_str().unwrap() panics on non-UTF-8 paths + +- **Location**: `src/avhw.rs:24, 255` +- **Description**: `drm_device.to_str().unwrap()` and `output_path.to_str().unwrap()` panic on non-UTF-8 paths. +- **Fix**: Replace with `path.to_str().ok_or_else(|| anyhow!("path is not valid UTF-8: {}", path.display()))?`. + +### #17: Portal shutdown swallows encoder errors + +- **Location**: `src/state_portal.rs:198-209`, `src/main.rs:227` +- **Description**: `StatePortal::shutdown()` catches and logs flush errors but returns `()`. `run_portal_pipewire` always returns `Ok(())` even if trailer writing failed. +- **Fix**: Change `shutdown` to return `Result<()>`. Propagate in `run_portal_pipewire`. + +### #18: WLR shutdown discards buffered FPS-limiter frame + +- **Location**: `src/main.rs:167`, `src/state.rs:546` +- **Description**: `state.fps_limit.flush()` return value is ignored. The limiter stores `S::Frame::default()` (not real frame data), so no actual frame is lost, but the design is misleading. +- **Fix**: Use timestamp-only limiter (see #9). If keeping FpsLimit, log the discarded flush value. + +### #19: StatePortal drops first PipeWire frame after format negotiation + +- **Location**: `src/state_portal.rs:56-82` +- **Description**: First `PwEvent::Frame` is used only for encoder creation, then `drop(frame)`. The first actual image is not encoded, creating a startup gap and interacting with FPS limiter issue #9. +- **Fix**: After encoder creation, pass the same frame to `handle_pw_frame(frame)` instead of dropping. Or set FPS limiter baseline correctly. + +### #20: CapPortal::Drop can block indefinitely on join + +- **Location**: `src/cap_portal.rs:137-144` +- **Description**: `Drop` sends `PwCmd::Shutdown` then unconditionally `join()`s. If the command channel send fails or `pw_main_loop_quit` doesn't wake the loop, `join` blocks forever. +- **Fix**: Replace helper thread with PW-loop-integrated wakeup. If keeping thread design, add a bounded shutdown timeout. + +### #21: pw::deinit() called from worker thread with process-wide implications + +- **Location**: `src/cap_portal.rs:155, 361` +- **Description**: `pw::init()`/`pw::deinit()` are process-global. Deinit inside a worker thread is fragile if future code adds another PipeWire user. +- **Fix**: Treat PW init as process-level lifecycle. Init once at startup, deinit once at shutdown, or omit explicit deinit. + +### #22: PipeWire callback assumes first data item contains complete DMA-BUF image + +- **Location**: `src/cap_portal.rs:260` +- **Description**: Only reads `datas[0]`. Multi-plane formats (NV12 etc.) would produce corrupted frames. +- **Fix**: Validate that negotiated DRM format is single-plane RGB/XRGB before emitting frames. + +### #24: CapPortal stores Tokio runtime but not portal session objects + +- **Location**: `src/cap_portal.rs:31` +- **Description**: `rt: Runtime` is stored but not used after `setup_portal`. The runtime does not keep `ashpd` proxy/session alive. +- **Fix**: Remove `rt` if not needed, or store actual session/proxy handles alongside it. + +### #25: poll_and_encode collapses all channel errors into "no event" + +- **Location**: `src/state_portal.rs:49-53` +- **Description**: `try_recv` errors are all mapped to `Ok(false)`. A disconnected channel (PipeWire thread crashed) is indistinguishable from an empty channel. +- **Fix**: Match on `RecvError` variant. On `Disconnected`, set `self.errored = true`. + +--- + +## Summary + +| Severity | Count | +|----------|-------| +| Critical | 1 | +| High | 4 | +| Medium | 5 | +| Low-Medium | 5 | +| Low | 10 | +| **Total** | **25** | + +### Recommended Fix Priority + +1. **#1** (Critical UAF) — PipeWire mainloop shutdown race +2. **#4** (High leak) — AVDRMFrameDescriptor leak on encode error +3. **#2** (High blocking) — PipeWire process callback blocking +4. **#5** (High validation) — Zero-format encoder init +5. **#3** (High PTS) — Portal PTS unit mismatch