From 14d1cf173a7cb4544cb07379ea7dc2fe08882aec Mon Sep 17 00:00:00 2001 From: dailz Date: Mon, 25 May 2026 08:56:21 +0800 Subject: [PATCH] chore: add .sisyphus/ to .gitignore, remove committed BUGS.md The .gitignore had a comment about Sisyphus artifacts but was missing the actual pattern. BUGS.md is replaced by the .sisyphus/ ignore rule. --- .gitignore | 1 + .sisyphus/BUGS.md | 207 ---------------------------------------------- 2 files changed, 1 insertion(+), 207 deletions(-) delete mode 100644 .sisyphus/BUGS.md diff --git a/.gitignore b/.gitignore index 95df177..3d10387 100644 --- a/.gitignore +++ b/.gitignore @@ -16,3 +16,4 @@ Thumbs.db # Sisyphus orchestration artifacts +.sisyphus/ diff --git a/.sisyphus/BUGS.md b/.sisyphus/BUGS.md deleted file mode 100644 index 967bd05..0000000 --- a/.sisyphus/BUGS.md +++ /dev/null @@ -1,207 +0,0 @@ -# wl-webrtc Bug Analysis Report - -> Generated by Oracle audit on 2026-05-22. Total: 25 bugs. - -## Severity Legend - -- 🔴 **Critical** — Crash, UB, or data corruption -- 🟠 **High** — Incorrect behavior, resource leak under normal usage -- 🟡 **Medium** — Wasted resources, logic errors with limited impact -- 🔵 **Low-Medium** — Fragile patterns, may cause issues under specific conditions -- ⚪ **Low** — Input validation, code smell, future-proofing - ---- - -## 🔴 Critical (1) - -### ~~#1: PipeWire helper thread Use-After-Free on mainloop quit~~ ✅ Fixed - -- **Location**: `src/cap_portal.rs:348-357, 359-363` -- **Description**: `mainloop.as_raw_ptr()` is cast to `usize`, an untracked thread is spawned that calls `pw_main_loop_quit` through that raw pointer. If `mainloop.run()` returns or the PipeWire thread exits before `CapPortal::drop` sends shutdown, the helper can outlive `mainloop` and call into freed memory. -- **Fix**: Remove the raw-pointer helper thread. Use a PipeWire loop event/eventfd/timer registered on the PipeWire loop, or store and join the helper thread with a lifetime guarantee that it exits before `mainloop` is dropped. -- **Fixed in**: [`e40ef9e`](commit:e40ef9e) — Replaced detached helper thread + raw pointer with eventfd registered via `add_io()` on the PipeWire loop. Shutdown callback runs on the loop thread during `mainloop.run()`, guaranteeing mainloop is alive. Drop order ensures IO source is unregistered before mainloop is destroyed. - ---- - -## 🟠 High (4) - -### ~~#2: PipeWire process callback can block indefinitely~~ ✅ Fixed - -- **Location**: `src/cap_portal.rs:320` -- **Description**: `frame_tx.send(PwEvent::Frame(frame))` is called from the PipeWire `.process` callback on a bounded channel of size 3. If encoding stalls, this blocks the PipeWire callback and delays `stream.queue_raw_buffer(raw_buf)`. -- **Fix**: Use `try_send` and drop frames on full, or use an unbounded channel with a backpressure counter. -- **Fixed in**: [`a09a423`](commit:a09a423) — Replaced `send()` with `try_send()` + `AtomicU64` drop counter. PipeWire `.process` callback is now guaranteed non-blocking. Dropped frames are counted and logged every 30 occurrences. - -### ~~#3: Portal PTS uses raw PipeWire timestamp as encoder frame-number timebase~~ ✅ Fixed - -- **Location**: `src/state_portal.rs:177`, `src/cap_portal.rs:279` -- **Description**: PipeWire PTS (from `spa_meta_header.pts`) is in nanoseconds (CLOCK_MONOTONIC), not frame-count units. It was assigned directly to `AVFrame.pts` while the encoder time_base is `1/fps`, causing the muxer to interpret timestamps as billions of frames and producing corrupted duration metadata. -- **Fix**: Record the first frame's PipeWire PTS as a nanosecond base, compute elapsed nanoseconds for each subsequent frame, then convert to frame numbers via `elapsed_ns * fps / 1_000_000_000`. Using elapsed time avoids i64 overflow on absolute timestamps. Invalid/negative PTS values are clamped to 0. -- **Fixed in**: [`ffb36b7`](commit:ffb36b7) — Added `first_pts_ns: Option` to `StatePortal`. PTS conversion matches WLR path pattern (`state.rs:525-527`). - -### ~~#4: AVDRMFrameDescriptor leaks if Portal encode fails~~ ✅ Fixed - -- **Location**: `src/state_portal.rs:183` -- **Description**: The descriptor is allocated with `Box::into_raw` at line 120 and manually recovered at lines 187-191. But `enc.encode_frame(&hw_frame)?` returns early on error before recovery. This leaks the boxed `AVDRMFrameDescriptor`. -- **Fix**: Wrap the descriptor pointer in a small RAII guard that reclaims the box on every return path, or recover the box before propagating the encode error. -- **Fixed in**: [`2d448dc`](commit:2d448dc) — Moved `Box::from_raw` descriptor recovery from after `encode_frame` to before it. Since `av_hwframe_transfer_data` has already imported the DMA-BUF into the VAAPI surface by that point, the descriptor struct is safe to reclaim. Now `encode_frame`'s `?` early-return can no longer leak the descriptor. - -### ~~#5: Portal can initialize encoder with zero or unknown format~~ ✅ Fixed - -- **Location**: `src/cap_portal.rs:306`, `src/state_portal.rs:68` -- **Description**: `format_info.get().unwrap_or((0, 0, 0, 0))` permits sending a frame with width, height, and format set to zero before format negotiation has completed. `StatePortal` treats the first frame as authoritative and creates the encoder from those dimensions. -- **Fix**: Do not emit `PwEvent::Frame` until format info is present and `spa_to_drm_fourcc` returns nonzero; otherwise requeue or drop the buffer. -- **Fixed in**: [`b9e62d6`](commit:b9e62d6) — Replaced `unwrap_or((0,0,0,0))` with `format_info.get()` `Some`/`None` check plus explicit `width/height/format == 0` guard, placed *before* `libc::dup(fd)` to avoid wasting a fd on dropped frames. Also partially fixes #15 (unsupported SPA format → drm_fourcc=0 is now rejected at source). - ---- - -## 🟡 Medium (4) - -### #6: WLR timestamp offset is applied after rescaling using wrong units - -- **Location**: `src/state.rs:525`, `src/avhw.rs:446` -- **Description**: WLR computes absolute PTS in encoder ticks, then `drain_encoder` rescales packet timestamps and subtracts `start_ts` afterward. If the muxer stream timebase changes after `avformat_write_header`, subtracting an encoder-timebase `start_ts` from stream-timebase packet timestamps is wrong. -- **Fix**: Subtract the first frame PTS before sending frames to the encoder/filter, or rescale `start_ts` with the same source and destination timebases before subtracting. - -### #7: FFmpeg format context and IO leak on EncState::new error paths - -- **Location**: `src/avhw.rs:256-323` -- **Description**: `EncState::new` allocates `AVFormatContext` with `avformat_alloc_output_context2`. After that, several `bail!` calls (lines 279, 285, 293, 319) can return before the raw pointer is wrapped by `Output::wrap` at line 323. This leaks `fmt_ctx_ptr` and, after `avio_open`, the opened `AVIOContext`. -- **Fix**: Introduce a local RAII guard for `AVFormatContext` immediately after allocation. The guard calls `avio_closep` if `pb` is non-null and `avformat_free_context` unless ownership has been transferred. Disarm after `wrap` succeeds. - -### #8: FPS limiting happens after expensive WLR capture/allocation - -- **Location**: `src/state.rs:383-549` -- **Description**: In the WLR path, `queue_alloc_frame` starts a screencopy request, `on_frame_allocd` allocates VAAPI surface, maps to DRM PRIME, duplicates DMA-BUF fds, creates Wayland buffer, and queues compositor copy. Only after the compositor signals ready does `on_copy_complete` apply FPS limiting. Frames that will be dropped still consume all these resources. -- **Fix**: Move FPS gating before `manager.capture_output` in `queue_alloc_frame`. Track `last_capture_requested` or `last_frame_emitted` as an `Instant`, skip queueing if target interval has not elapsed. - -### #9: FpsLimit drops the second frame after every first encoded frame - -- **Location**: `src/fps_limit.rs:19`, `src/state.rs:542`, `src/state_portal.rs:103` -- **Description**: Both state machines bypass `FpsLimit` for the first frame. Because that first frame is never fed into `FpsLimit`, the next frame calls `on_new_frame` with empty `on_deck`; it returns `None`, so the second frame is always dropped regardless of elapsed time. -- **Fix**: Replace `FpsLimit` with a timestamp-only limiter that stores `last_emitted: Option` and returns a boolean. On the first encoded frame, set `last_emitted = Some(now)`. - -### #10: Intermediate stage left installed on fatal initialization errors - -- **Location**: `src/state.rs:582, 724` -- **Description**: `negotiate_format` and `try_finalize_output` use `mem::replace(&mut self.stage, EncConstructionStage::Intermediate)`. If `create_encoder` fails or managers are missing, `self.errored` is set but `self.stage` remains `Intermediate`. Resources moved out are dropped implicitly, and future retry logic would operate on a meaningless state. -- **Fix**: Replace raw `mem::replace` transitions with a helper that has explicit commit/rollback semantics. Build all fallible resources before moving the stage. When unavoidable, restore the old stage on failure or install a dedicated `Failed` stage. - ---- - -## 🔵 Low-Medium (5) - -### #11: av_buffersrc_parameters_set failure can leak cloned hw_frames_ctx - -- **Location**: `src/avhw.rs:538-557` -- **Description**: `frames_rgb.ref_clone()` is assigned to `par.hw_frames_ctx`. If `av_buffersrc_parameters_set` fails, the cloned `AVBufferRef` may not have transferred ownership, and the code does not call `av_buffer_unref` on it. -- **Fix**: Store the cloned reference in a local mutable pointer. On failure, call `av_buffer_unref(&mut cloned_ref)`. - -### #12: Portal session lifetime may be too short - -- **Location**: `src/cap_portal.rs:83-134` -- **Description**: `setup_portal` creates an ashpd Screencast proxy and session, returns only `(OwnedFd, node_id)`, and drops both proxy and session when it returns. If ashpd closes the screencast session on drop, capture may end immediately. -- **Fix**: Verify ashpd session drop semantics. If dropping the session closes the portal session, store the session handle inside `CapPortal` for the entire capture lifetime. - -### #13: Portal descriptor uses borrowed DMA-BUF fd without clear lifetime contract across FFmpeg import - -- **Location**: `src/state_portal.rs:217` -- **Description**: `build_drm_descriptor` stores `frame.fd.as_raw_fd()` into `desc.objects[0].fd`. The fd is owned by `PwDmaBufFrame`. If FFmpeg/VAAPI retains the fd beyond `av_hwframe_transfer_data`, `PwDmaBufFrame` dropping at end of `handle_pw_frame` can close an in-use fd. -- **Fix**: Duplicate the fd for the descriptor if FFmpeg import is not guaranteed synchronous. Create an RAII owner that contains both the boxed descriptor and `OwnedFd` duplicates. - -### #14: WLR duplicate DMA-BUF fds passed to Wayland rely on subtle OwnedFd::as_fd ownership - -- **Location**: `src/state.rs:463-482` -- **Description**: `on_frame_allocd` dups `obj.fd`, wraps with `OwnedFd::from_raw_fd`, and passes `fd_owned.as_fd()` into `params.add`. The comment says `params.add()` takes ownership, but code passes a `BorrowedFd`. `fd_owned` is dropped at end of loop iteration. -- **Fix**: Confirm the generated signature for `ZwpLinuxBufferParamsV1::add`. If ownership transfer is required, use the binding's owned-fd API or transfer with `into_raw_fd`. - -### #15 (moved to Low): spa_to_drm_fourcc returns 0, downstream treats as usable - -- **Location**: `src/cap_portal.rs:370-381` -- **Description**: Unsupported SPA video formats map to `0`. This value is stored in `format_info` and later used in `AVDRMFrameDescriptor.layers[0].format` with no rejection path. -- **Fix**: Make `spa_to_drm_fourcc` return `Option`. On `None`, send a fatal `PwEvent::Error` or renegotiate. - ---- - -## ⚪ Low (8) - -### #15: --fps 0 causes panic - -- **Location**: `src/fps_limit.rs:12`, `src/avhw.rs:217` -- **Description**: `1.0 / 0.0u32` produces infinite duration. Encoder timebase `Rational::new(1, 0)` is also invalid. -- **Fix**: Validate `args.fps > 0` after parsing. Use a clap value parser. - -### #16: to_str().unwrap() panics on non-UTF-8 paths - -- **Location**: `src/avhw.rs:24, 255` -- **Description**: `drm_device.to_str().unwrap()` and `output_path.to_str().unwrap()` panic on non-UTF-8 paths. -- **Fix**: Replace with `path.to_str().ok_or_else(|| anyhow!("path is not valid UTF-8: {}", path.display()))?`. - -### #17: Portal shutdown swallows encoder errors - -- **Location**: `src/state_portal.rs:198-209`, `src/main.rs:227` -- **Description**: `StatePortal::shutdown()` catches and logs flush errors but returns `()`. `run_portal_pipewire` always returns `Ok(())` even if trailer writing failed. -- **Fix**: Change `shutdown` to return `Result<()>`. Propagate in `run_portal_pipewire`. - -### #18: WLR shutdown discards buffered FPS-limiter frame - -- **Location**: `src/main.rs:167`, `src/state.rs:546` -- **Description**: `state.fps_limit.flush()` return value is ignored. The limiter stores `S::Frame::default()` (not real frame data), so no actual frame is lost, but the design is misleading. -- **Fix**: Use timestamp-only limiter (see #9). If keeping FpsLimit, log the discarded flush value. - -### #19: StatePortal drops first PipeWire frame after format negotiation - -- **Location**: `src/state_portal.rs:56-82` -- **Description**: First `PwEvent::Frame` is used only for encoder creation, then `drop(frame)`. The first actual image is not encoded, creating a startup gap and interacting with FPS limiter issue #9. -- **Fix**: After encoder creation, pass the same frame to `handle_pw_frame(frame)` instead of dropping. Or set FPS limiter baseline correctly. - -### #20: CapPortal::Drop can block indefinitely on join - -- **Location**: `src/cap_portal.rs:137-144` -- **Description**: `Drop` sends `PwCmd::Shutdown` then unconditionally `join()`s. If the command channel send fails or `pw_main_loop_quit` doesn't wake the loop, `join` blocks forever. -- **Fix**: Replace helper thread with PW-loop-integrated wakeup. If keeping thread design, add a bounded shutdown timeout. - -### #21: pw::deinit() called from worker thread with process-wide implications - -- **Location**: `src/cap_portal.rs:155, 361` -- **Description**: `pw::init()`/`pw::deinit()` are process-global. Deinit inside a worker thread is fragile if future code adds another PipeWire user. -- **Fix**: Treat PW init as process-level lifecycle. Init once at startup, deinit once at shutdown, or omit explicit deinit. - -### #22: PipeWire callback assumes first data item contains complete DMA-BUF image - -- **Location**: `src/cap_portal.rs:260` -- **Description**: Only reads `datas[0]`. Multi-plane formats (NV12 etc.) would produce corrupted frames. -- **Fix**: Validate that negotiated DRM format is single-plane RGB/XRGB before emitting frames. - -### #24: CapPortal stores Tokio runtime but not portal session objects - -- **Location**: `src/cap_portal.rs:31` -- **Description**: `rt: Runtime` is stored but not used after `setup_portal`. The runtime does not keep `ashpd` proxy/session alive. -- **Fix**: Remove `rt` if not needed, or store actual session/proxy handles alongside it. - -### #25: poll_and_encode collapses all channel errors into "no event" - -- **Location**: `src/state_portal.rs:49-53` -- **Description**: `try_recv` errors are all mapped to `Ok(false)`. A disconnected channel (PipeWire thread crashed) is indistinguishable from an empty channel. -- **Fix**: Match on `RecvError` variant. On `Disconnected`, set `self.errored = true`. - ---- - -## Summary - -| Severity | Count | -|----------|-------| -| Critical | 1 | -| High | 4 | -| Medium | 5 | -| Low-Medium | 5 | -| Low | 10 | -| **Total** | **25** | - -### Recommended Fix Priority - -1. **#1** (Critical UAF) — PipeWire mainloop shutdown race -2. **#4** (High leak) — AVDRMFrameDescriptor leak on encode error -3. **#2** (High blocking) — PipeWire process callback blocking -4. **#5** (High validation) — Zero-format encoder init -5. **#3** (High PTS) — Portal PTS unit mismatch