chore: add .sisyphus/ to .gitignore, remove committed BUGS.md

The .gitignore had a comment about Sisyphus artifacts but was missing
the actual pattern. BUGS.md is replaced by the .sisyphus/ ignore rule.
This commit is contained in:
dailz
2026-05-25 08:56:21 +08:00
parent 573569ade7
commit 14d1cf173a
2 changed files with 1 additions and 207 deletions

1
.gitignore vendored
View File

@@ -16,3 +16,4 @@
Thumbs.db
# Sisyphus orchestration artifacts
.sisyphus/

View File

@@ -1,207 +0,0 @@
# wl-webrtc Bug Analysis Report
> Generated by Oracle audit on 2026-05-22. Total: 25 bugs.
## Severity Legend
- 🔴 **Critical** — Crash, UB, or data corruption
- 🟠 **High** — Incorrect behavior, resource leak under normal usage
- 🟡 **Medium** — Wasted resources, logic errors with limited impact
- 🔵 **Low-Medium** — Fragile patterns, may cause issues under specific conditions
-**Low** — Input validation, code smell, future-proofing
---
## 🔴 Critical (1)
### ~~#1: PipeWire helper thread Use-After-Free on mainloop quit~~ ✅ Fixed
- **Location**: `src/cap_portal.rs:348-357, 359-363`
- **Description**: `mainloop.as_raw_ptr()` is cast to `usize`, an untracked thread is spawned that calls `pw_main_loop_quit` through that raw pointer. If `mainloop.run()` returns or the PipeWire thread exits before `CapPortal::drop` sends shutdown, the helper can outlive `mainloop` and call into freed memory.
- **Fix**: Remove the raw-pointer helper thread. Use a PipeWire loop event/eventfd/timer registered on the PipeWire loop, or store and join the helper thread with a lifetime guarantee that it exits before `mainloop` is dropped.
- **Fixed in**: [`e40ef9e`](commit:e40ef9e) — Replaced detached helper thread + raw pointer with eventfd registered via `add_io()` on the PipeWire loop. Shutdown callback runs on the loop thread during `mainloop.run()`, guaranteeing mainloop is alive. Drop order ensures IO source is unregistered before mainloop is destroyed.
---
## 🟠 High (4)
### ~~#2: PipeWire process callback can block indefinitely~~ ✅ Fixed
- **Location**: `src/cap_portal.rs:320`
- **Description**: `frame_tx.send(PwEvent::Frame(frame))` is called from the PipeWire `.process` callback on a bounded channel of size 3. If encoding stalls, this blocks the PipeWire callback and delays `stream.queue_raw_buffer(raw_buf)`.
- **Fix**: Use `try_send` and drop frames on full, or use an unbounded channel with a backpressure counter.
- **Fixed in**: [`a09a423`](commit:a09a423) — Replaced `send()` with `try_send()` + `AtomicU64` drop counter. PipeWire `.process` callback is now guaranteed non-blocking. Dropped frames are counted and logged every 30 occurrences.
### ~~#3: Portal PTS uses raw PipeWire timestamp as encoder frame-number timebase~~ ✅ Fixed
- **Location**: `src/state_portal.rs:177`, `src/cap_portal.rs:279`
- **Description**: PipeWire PTS (from `spa_meta_header.pts`) is in nanoseconds (CLOCK_MONOTONIC), not frame-count units. It was assigned directly to `AVFrame.pts` while the encoder time_base is `1/fps`, causing the muxer to interpret timestamps as billions of frames and producing corrupted duration metadata.
- **Fix**: Record the first frame's PipeWire PTS as a nanosecond base, compute elapsed nanoseconds for each subsequent frame, then convert to frame numbers via `elapsed_ns * fps / 1_000_000_000`. Using elapsed time avoids i64 overflow on absolute timestamps. Invalid/negative PTS values are clamped to 0.
- **Fixed in**: [`ffb36b7`](commit:ffb36b7) — Added `first_pts_ns: Option<i64>` to `StatePortal`. PTS conversion matches WLR path pattern (`state.rs:525-527`).
### ~~#4: AVDRMFrameDescriptor leaks if Portal encode fails~~ ✅ Fixed
- **Location**: `src/state_portal.rs:183`
- **Description**: The descriptor is allocated with `Box::into_raw` at line 120 and manually recovered at lines 187-191. But `enc.encode_frame(&hw_frame)?` returns early on error before recovery. This leaks the boxed `AVDRMFrameDescriptor`.
- **Fix**: Wrap the descriptor pointer in a small RAII guard that reclaims the box on every return path, or recover the box before propagating the encode error.
- **Fixed in**: [`2d448dc`](commit:2d448dc) — Moved `Box::from_raw` descriptor recovery from after `encode_frame` to before it. Since `av_hwframe_transfer_data` has already imported the DMA-BUF into the VAAPI surface by that point, the descriptor struct is safe to reclaim. Now `encode_frame`'s `?` early-return can no longer leak the descriptor.
### ~~#5: Portal can initialize encoder with zero or unknown format~~ ✅ Fixed
- **Location**: `src/cap_portal.rs:306`, `src/state_portal.rs:68`
- **Description**: `format_info.get().unwrap_or((0, 0, 0, 0))` permits sending a frame with width, height, and format set to zero before format negotiation has completed. `StatePortal` treats the first frame as authoritative and creates the encoder from those dimensions.
- **Fix**: Do not emit `PwEvent::Frame` until format info is present and `spa_to_drm_fourcc` returns nonzero; otherwise requeue or drop the buffer.
- **Fixed in**: [`b9e62d6`](commit:b9e62d6) — Replaced `unwrap_or((0,0,0,0))` with `format_info.get()` `Some`/`None` check plus explicit `width/height/format == 0` guard, placed *before* `libc::dup(fd)` to avoid wasting a fd on dropped frames. Also partially fixes #15 (unsupported SPA format → drm_fourcc=0 is now rejected at source).
---
## 🟡 Medium (4)
### #6: WLR timestamp offset is applied after rescaling using wrong units
- **Location**: `src/state.rs:525`, `src/avhw.rs:446`
- **Description**: WLR computes absolute PTS in encoder ticks, then `drain_encoder` rescales packet timestamps and subtracts `start_ts` afterward. If the muxer stream timebase changes after `avformat_write_header`, subtracting an encoder-timebase `start_ts` from stream-timebase packet timestamps is wrong.
- **Fix**: Subtract the first frame PTS before sending frames to the encoder/filter, or rescale `start_ts` with the same source and destination timebases before subtracting.
### #7: FFmpeg format context and IO leak on EncState::new error paths
- **Location**: `src/avhw.rs:256-323`
- **Description**: `EncState::new` allocates `AVFormatContext` with `avformat_alloc_output_context2`. After that, several `bail!` calls (lines 279, 285, 293, 319) can return before the raw pointer is wrapped by `Output::wrap` at line 323. This leaks `fmt_ctx_ptr` and, after `avio_open`, the opened `AVIOContext`.
- **Fix**: Introduce a local RAII guard for `AVFormatContext` immediately after allocation. The guard calls `avio_closep` if `pb` is non-null and `avformat_free_context` unless ownership has been transferred. Disarm after `wrap` succeeds.
### #8: FPS limiting happens after expensive WLR capture/allocation
- **Location**: `src/state.rs:383-549`
- **Description**: In the WLR path, `queue_alloc_frame` starts a screencopy request, `on_frame_allocd` allocates VAAPI surface, maps to DRM PRIME, duplicates DMA-BUF fds, creates Wayland buffer, and queues compositor copy. Only after the compositor signals ready does `on_copy_complete` apply FPS limiting. Frames that will be dropped still consume all these resources.
- **Fix**: Move FPS gating before `manager.capture_output` in `queue_alloc_frame`. Track `last_capture_requested` or `last_frame_emitted` as an `Instant`, skip queueing if target interval has not elapsed.
### #9: FpsLimit drops the second frame after every first encoded frame
- **Location**: `src/fps_limit.rs:19`, `src/state.rs:542`, `src/state_portal.rs:103`
- **Description**: Both state machines bypass `FpsLimit` for the first frame. Because that first frame is never fed into `FpsLimit`, the next frame calls `on_new_frame` with empty `on_deck`; it returns `None`, so the second frame is always dropped regardless of elapsed time.
- **Fix**: Replace `FpsLimit<T>` with a timestamp-only limiter that stores `last_emitted: Option<Instant>` and returns a boolean. On the first encoded frame, set `last_emitted = Some(now)`.
### #10: Intermediate stage left installed on fatal initialization errors
- **Location**: `src/state.rs:582, 724`
- **Description**: `negotiate_format` and `try_finalize_output` use `mem::replace(&mut self.stage, EncConstructionStage::Intermediate)`. If `create_encoder` fails or managers are missing, `self.errored` is set but `self.stage` remains `Intermediate`. Resources moved out are dropped implicitly, and future retry logic would operate on a meaningless state.
- **Fix**: Replace raw `mem::replace` transitions with a helper that has explicit commit/rollback semantics. Build all fallible resources before moving the stage. When unavoidable, restore the old stage on failure or install a dedicated `Failed` stage.
---
## 🔵 Low-Medium (5)
### #11: av_buffersrc_parameters_set failure can leak cloned hw_frames_ctx
- **Location**: `src/avhw.rs:538-557`
- **Description**: `frames_rgb.ref_clone()` is assigned to `par.hw_frames_ctx`. If `av_buffersrc_parameters_set` fails, the cloned `AVBufferRef` may not have transferred ownership, and the code does not call `av_buffer_unref` on it.
- **Fix**: Store the cloned reference in a local mutable pointer. On failure, call `av_buffer_unref(&mut cloned_ref)`.
### #12: Portal session lifetime may be too short
- **Location**: `src/cap_portal.rs:83-134`
- **Description**: `setup_portal` creates an ashpd Screencast proxy and session, returns only `(OwnedFd, node_id)`, and drops both proxy and session when it returns. If ashpd closes the screencast session on drop, capture may end immediately.
- **Fix**: Verify ashpd session drop semantics. If dropping the session closes the portal session, store the session handle inside `CapPortal` for the entire capture lifetime.
### #13: Portal descriptor uses borrowed DMA-BUF fd without clear lifetime contract across FFmpeg import
- **Location**: `src/state_portal.rs:217`
- **Description**: `build_drm_descriptor` stores `frame.fd.as_raw_fd()` into `desc.objects[0].fd`. The fd is owned by `PwDmaBufFrame`. If FFmpeg/VAAPI retains the fd beyond `av_hwframe_transfer_data`, `PwDmaBufFrame` dropping at end of `handle_pw_frame` can close an in-use fd.
- **Fix**: Duplicate the fd for the descriptor if FFmpeg import is not guaranteed synchronous. Create an RAII owner that contains both the boxed descriptor and `OwnedFd` duplicates.
### #14: WLR duplicate DMA-BUF fds passed to Wayland rely on subtle OwnedFd::as_fd ownership
- **Location**: `src/state.rs:463-482`
- **Description**: `on_frame_allocd` dups `obj.fd`, wraps with `OwnedFd::from_raw_fd`, and passes `fd_owned.as_fd()` into `params.add`. The comment says `params.add()` takes ownership, but code passes a `BorrowedFd`. `fd_owned` is dropped at end of loop iteration.
- **Fix**: Confirm the generated signature for `ZwpLinuxBufferParamsV1::add`. If ownership transfer is required, use the binding's owned-fd API or transfer with `into_raw_fd`.
### #15 (moved to Low): spa_to_drm_fourcc returns 0, downstream treats as usable
- **Location**: `src/cap_portal.rs:370-381`
- **Description**: Unsupported SPA video formats map to `0`. This value is stored in `format_info` and later used in `AVDRMFrameDescriptor.layers[0].format` with no rejection path.
- **Fix**: Make `spa_to_drm_fourcc` return `Option<u32>`. On `None`, send a fatal `PwEvent::Error` or renegotiate.
---
## ⚪ Low (8)
### #15: --fps 0 causes panic
- **Location**: `src/fps_limit.rs:12`, `src/avhw.rs:217`
- **Description**: `1.0 / 0.0u32` produces infinite duration. Encoder timebase `Rational::new(1, 0)` is also invalid.
- **Fix**: Validate `args.fps > 0` after parsing. Use a clap value parser.
### #16: to_str().unwrap() panics on non-UTF-8 paths
- **Location**: `src/avhw.rs:24, 255`
- **Description**: `drm_device.to_str().unwrap()` and `output_path.to_str().unwrap()` panic on non-UTF-8 paths.
- **Fix**: Replace with `path.to_str().ok_or_else(|| anyhow!("path is not valid UTF-8: {}", path.display()))?`.
### #17: Portal shutdown swallows encoder errors
- **Location**: `src/state_portal.rs:198-209`, `src/main.rs:227`
- **Description**: `StatePortal::shutdown()` catches and logs flush errors but returns `()`. `run_portal_pipewire` always returns `Ok(())` even if trailer writing failed.
- **Fix**: Change `shutdown` to return `Result<()>`. Propagate in `run_portal_pipewire`.
### #18: WLR shutdown discards buffered FPS-limiter frame
- **Location**: `src/main.rs:167`, `src/state.rs:546`
- **Description**: `state.fps_limit.flush()` return value is ignored. The limiter stores `S::Frame::default()` (not real frame data), so no actual frame is lost, but the design is misleading.
- **Fix**: Use timestamp-only limiter (see #9). If keeping FpsLimit, log the discarded flush value.
### #19: StatePortal drops first PipeWire frame after format negotiation
- **Location**: `src/state_portal.rs:56-82`
- **Description**: First `PwEvent::Frame` is used only for encoder creation, then `drop(frame)`. The first actual image is not encoded, creating a startup gap and interacting with FPS limiter issue #9.
- **Fix**: After encoder creation, pass the same frame to `handle_pw_frame(frame)` instead of dropping. Or set FPS limiter baseline correctly.
### #20: CapPortal::Drop can block indefinitely on join
- **Location**: `src/cap_portal.rs:137-144`
- **Description**: `Drop` sends `PwCmd::Shutdown` then unconditionally `join()`s. If the command channel send fails or `pw_main_loop_quit` doesn't wake the loop, `join` blocks forever.
- **Fix**: Replace helper thread with PW-loop-integrated wakeup. If keeping thread design, add a bounded shutdown timeout.
### #21: pw::deinit() called from worker thread with process-wide implications
- **Location**: `src/cap_portal.rs:155, 361`
- **Description**: `pw::init()`/`pw::deinit()` are process-global. Deinit inside a worker thread is fragile if future code adds another PipeWire user.
- **Fix**: Treat PW init as process-level lifecycle. Init once at startup, deinit once at shutdown, or omit explicit deinit.
### #22: PipeWire callback assumes first data item contains complete DMA-BUF image
- **Location**: `src/cap_portal.rs:260`
- **Description**: Only reads `datas[0]`. Multi-plane formats (NV12 etc.) would produce corrupted frames.
- **Fix**: Validate that negotiated DRM format is single-plane RGB/XRGB before emitting frames.
### #24: CapPortal stores Tokio runtime but not portal session objects
- **Location**: `src/cap_portal.rs:31`
- **Description**: `rt: Runtime` is stored but not used after `setup_portal`. The runtime does not keep `ashpd` proxy/session alive.
- **Fix**: Remove `rt` if not needed, or store actual session/proxy handles alongside it.
### #25: poll_and_encode collapses all channel errors into "no event"
- **Location**: `src/state_portal.rs:49-53`
- **Description**: `try_recv` errors are all mapped to `Ok(false)`. A disconnected channel (PipeWire thread crashed) is indistinguishable from an empty channel.
- **Fix**: Match on `RecvError` variant. On `Disconnected`, set `self.errored = true`.
---
## Summary
| Severity | Count |
|----------|-------|
| Critical | 1 |
| High | 4 |
| Medium | 5 |
| Low-Medium | 5 |
| Low | 10 |
| **Total** | **25** |
### Recommended Fix Priority
1. **#1** (Critical UAF) — PipeWire mainloop shutdown race
2. **#4** (High leak) — AVDRMFrameDescriptor leak on encode error
3. **#2** (High blocking) — PipeWire process callback blocking
4. **#5** (High validation) — Zero-format encoder init
5. **#3** (High PTS) — Portal PTS unit mismatch