203
.sisyphus/BUGS.md
Normal file
203
.sisyphus/BUGS.md
Normal file
@@ -0,0 +1,203 @@
|
||||
# wl-webrtc Bug Analysis Report
|
||||
|
||||
> Generated by Oracle audit on 2026-05-22. Total: 25 bugs.
|
||||
|
||||
## Severity Legend
|
||||
|
||||
- 🔴 **Critical** — Crash, UB, or data corruption
|
||||
- 🟠 **High** — Incorrect behavior, resource leak under normal usage
|
||||
- 🟡 **Medium** — Wasted resources, logic errors with limited impact
|
||||
- 🔵 **Low-Medium** — Fragile patterns, may cause issues under specific conditions
|
||||
- ⚪ **Low** — Input validation, code smell, future-proofing
|
||||
|
||||
---
|
||||
|
||||
## 🔴 Critical (1)
|
||||
|
||||
### ~~#1: PipeWire helper thread Use-After-Free on mainloop quit~~ ✅ Fixed
|
||||
|
||||
- **Location**: `src/cap_portal.rs:348-357, 359-363`
|
||||
- **Description**: `mainloop.as_raw_ptr()` is cast to `usize`, an untracked thread is spawned that calls `pw_main_loop_quit` through that raw pointer. If `mainloop.run()` returns or the PipeWire thread exits before `CapPortal::drop` sends shutdown, the helper can outlive `mainloop` and call into freed memory.
|
||||
- **Fix**: Remove the raw-pointer helper thread. Use a PipeWire loop event/eventfd/timer registered on the PipeWire loop, or store and join the helper thread with a lifetime guarantee that it exits before `mainloop` is dropped.
|
||||
- **Fixed in**: [`e40ef9e`](commit:e40ef9e) — Replaced detached helper thread + raw pointer with eventfd registered via `add_io()` on the PipeWire loop. Shutdown callback runs on the loop thread during `mainloop.run()`, guaranteeing mainloop is alive. Drop order ensures IO source is unregistered before mainloop is destroyed.
|
||||
|
||||
---
|
||||
|
||||
## 🟠 High (4)
|
||||
|
||||
### #2: PipeWire process callback can block indefinitely
|
||||
|
||||
- **Location**: `src/cap_portal.rs:320`
|
||||
- **Description**: `frame_tx.send(PwEvent::Frame(frame))` is called from the PipeWire `.process` callback on a bounded channel of size 3. If encoding stalls, this blocks the PipeWire callback and delays `stream.queue_raw_buffer(raw_buf)`.
|
||||
- **Fix**: Use `try_send` and drop frames on full, or use an unbounded channel with a backpressure counter.
|
||||
|
||||
### #3: Portal PTS uses raw PipeWire timestamp as encoder frame-number timebase
|
||||
|
||||
- **Location**: `src/state_portal.rs:177`, `src/cap_portal.rs:279`
|
||||
- **Description**: PipeWire PTS (from `spa_meta_header.pts`) is not guaranteed to be in frame-count units — it may be nanoseconds or an invalid sentinel. It is assigned directly to `AVFrame.pts` while the encoder time_base is `1/fps`.
|
||||
- **Fix**: Normalize timestamps at the Portal boundary: store the first valid PipeWire PTS, convert elapsed time to encoder ticks with `elapsed_ns * fps / 1_000_000_000`, and synthesize monotonic frame numbers when PTS is invalid.
|
||||
|
||||
### #4: AVDRMFrameDescriptor leaks if Portal encode fails
|
||||
|
||||
- **Location**: `src/state_portal.rs:183`
|
||||
- **Description**: The descriptor is allocated with `Box::into_raw` at line 120 and manually recovered at lines 187-191. But `enc.encode_frame(&hw_frame)?` returns early on error before recovery. This leaks the boxed `AVDRMFrameDescriptor`.
|
||||
- **Fix**: Wrap the descriptor pointer in a small RAII guard that reclaims the box on every return path, or recover the box before propagating the encode error.
|
||||
|
||||
### #5: Portal can initialize encoder with zero or unknown format
|
||||
|
||||
- **Location**: `src/cap_portal.rs:306`, `src/state_portal.rs:68`
|
||||
- **Description**: `format_info.get().unwrap_or((0, 0, 0, 0))` permits sending a frame with width, height, and format set to zero before format negotiation has completed. `StatePortal` treats the first frame as authoritative and creates the encoder from those dimensions.
|
||||
- **Fix**: Do not emit `PwEvent::Frame` until format info is present and `spa_to_drm_fourcc` returns nonzero; otherwise requeue or drop the buffer.
|
||||
|
||||
---
|
||||
|
||||
## 🟡 Medium (4)
|
||||
|
||||
### #6: WLR timestamp offset is applied after rescaling using wrong units
|
||||
|
||||
- **Location**: `src/state.rs:525`, `src/avhw.rs:446`
|
||||
- **Description**: WLR computes absolute PTS in encoder ticks, then `drain_encoder` rescales packet timestamps and subtracts `start_ts` afterward. If the muxer stream timebase changes after `avformat_write_header`, subtracting an encoder-timebase `start_ts` from stream-timebase packet timestamps is wrong.
|
||||
- **Fix**: Subtract the first frame PTS before sending frames to the encoder/filter, or rescale `start_ts` with the same source and destination timebases before subtracting.
|
||||
|
||||
### #7: FFmpeg format context and IO leak on EncState::new error paths
|
||||
|
||||
- **Location**: `src/avhw.rs:256-323`
|
||||
- **Description**: `EncState::new` allocates `AVFormatContext` with `avformat_alloc_output_context2`. After that, several `bail!` calls (lines 279, 285, 293, 319) can return before the raw pointer is wrapped by `Output::wrap` at line 323. This leaks `fmt_ctx_ptr` and, after `avio_open`, the opened `AVIOContext`.
|
||||
- **Fix**: Introduce a local RAII guard for `AVFormatContext` immediately after allocation. The guard calls `avio_closep` if `pb` is non-null and `avformat_free_context` unless ownership has been transferred. Disarm after `wrap` succeeds.
|
||||
|
||||
### #8: FPS limiting happens after expensive WLR capture/allocation
|
||||
|
||||
- **Location**: `src/state.rs:383-549`
|
||||
- **Description**: In the WLR path, `queue_alloc_frame` starts a screencopy request, `on_frame_allocd` allocates VAAPI surface, maps to DRM PRIME, duplicates DMA-BUF fds, creates Wayland buffer, and queues compositor copy. Only after the compositor signals ready does `on_copy_complete` apply FPS limiting. Frames that will be dropped still consume all these resources.
|
||||
- **Fix**: Move FPS gating before `manager.capture_output` in `queue_alloc_frame`. Track `last_capture_requested` or `last_frame_emitted` as an `Instant`, skip queueing if target interval has not elapsed.
|
||||
|
||||
### #9: FpsLimit drops the second frame after every first encoded frame
|
||||
|
||||
- **Location**: `src/fps_limit.rs:19`, `src/state.rs:542`, `src/state_portal.rs:103`
|
||||
- **Description**: Both state machines bypass `FpsLimit` for the first frame. Because that first frame is never fed into `FpsLimit`, the next frame calls `on_new_frame` with empty `on_deck`; it returns `None`, so the second frame is always dropped regardless of elapsed time.
|
||||
- **Fix**: Replace `FpsLimit<T>` with a timestamp-only limiter that stores `last_emitted: Option<Instant>` and returns a boolean. On the first encoded frame, set `last_emitted = Some(now)`.
|
||||
|
||||
### #10: Intermediate stage left installed on fatal initialization errors
|
||||
|
||||
- **Location**: `src/state.rs:582, 724`
|
||||
- **Description**: `negotiate_format` and `try_finalize_output` use `mem::replace(&mut self.stage, EncConstructionStage::Intermediate)`. If `create_encoder` fails or managers are missing, `self.errored` is set but `self.stage` remains `Intermediate`. Resources moved out are dropped implicitly, and future retry logic would operate on a meaningless state.
|
||||
- **Fix**: Replace raw `mem::replace` transitions with a helper that has explicit commit/rollback semantics. Build all fallible resources before moving the stage. When unavoidable, restore the old stage on failure or install a dedicated `Failed` stage.
|
||||
|
||||
---
|
||||
|
||||
## 🔵 Low-Medium (5)
|
||||
|
||||
### #11: av_buffersrc_parameters_set failure can leak cloned hw_frames_ctx
|
||||
|
||||
- **Location**: `src/avhw.rs:538-557`
|
||||
- **Description**: `frames_rgb.ref_clone()` is assigned to `par.hw_frames_ctx`. If `av_buffersrc_parameters_set` fails, the cloned `AVBufferRef` may not have transferred ownership, and the code does not call `av_buffer_unref` on it.
|
||||
- **Fix**: Store the cloned reference in a local mutable pointer. On failure, call `av_buffer_unref(&mut cloned_ref)`.
|
||||
|
||||
### #12: Portal session lifetime may be too short
|
||||
|
||||
- **Location**: `src/cap_portal.rs:83-134`
|
||||
- **Description**: `setup_portal` creates an ashpd Screencast proxy and session, returns only `(OwnedFd, node_id)`, and drops both proxy and session when it returns. If ashpd closes the screencast session on drop, capture may end immediately.
|
||||
- **Fix**: Verify ashpd session drop semantics. If dropping the session closes the portal session, store the session handle inside `CapPortal` for the entire capture lifetime.
|
||||
|
||||
### #13: Portal descriptor uses borrowed DMA-BUF fd without clear lifetime contract across FFmpeg import
|
||||
|
||||
- **Location**: `src/state_portal.rs:217`
|
||||
- **Description**: `build_drm_descriptor` stores `frame.fd.as_raw_fd()` into `desc.objects[0].fd`. The fd is owned by `PwDmaBufFrame`. If FFmpeg/VAAPI retains the fd beyond `av_hwframe_transfer_data`, `PwDmaBufFrame` dropping at end of `handle_pw_frame` can close an in-use fd.
|
||||
- **Fix**: Duplicate the fd for the descriptor if FFmpeg import is not guaranteed synchronous. Create an RAII owner that contains both the boxed descriptor and `OwnedFd` duplicates.
|
||||
|
||||
### #14: WLR duplicate DMA-BUF fds passed to Wayland rely on subtle OwnedFd::as_fd ownership
|
||||
|
||||
- **Location**: `src/state.rs:463-482`
|
||||
- **Description**: `on_frame_allocd` dups `obj.fd`, wraps with `OwnedFd::from_raw_fd`, and passes `fd_owned.as_fd()` into `params.add`. The comment says `params.add()` takes ownership, but code passes a `BorrowedFd`. `fd_owned` is dropped at end of loop iteration.
|
||||
- **Fix**: Confirm the generated signature for `ZwpLinuxBufferParamsV1::add`. If ownership transfer is required, use the binding's owned-fd API or transfer with `into_raw_fd`.
|
||||
|
||||
### #15 (moved to Low): spa_to_drm_fourcc returns 0, downstream treats as usable
|
||||
|
||||
- **Location**: `src/cap_portal.rs:370-381`
|
||||
- **Description**: Unsupported SPA video formats map to `0`. This value is stored in `format_info` and later used in `AVDRMFrameDescriptor.layers[0].format` with no rejection path.
|
||||
- **Fix**: Make `spa_to_drm_fourcc` return `Option<u32>`. On `None`, send a fatal `PwEvent::Error` or renegotiate.
|
||||
|
||||
---
|
||||
|
||||
## ⚪ Low (8)
|
||||
|
||||
### #15: --fps 0 causes panic
|
||||
|
||||
- **Location**: `src/fps_limit.rs:12`, `src/avhw.rs:217`
|
||||
- **Description**: `1.0 / 0.0u32` produces infinite duration. Encoder timebase `Rational::new(1, 0)` is also invalid.
|
||||
- **Fix**: Validate `args.fps > 0` after parsing. Use a clap value parser.
|
||||
|
||||
### #16: to_str().unwrap() panics on non-UTF-8 paths
|
||||
|
||||
- **Location**: `src/avhw.rs:24, 255`
|
||||
- **Description**: `drm_device.to_str().unwrap()` and `output_path.to_str().unwrap()` panic on non-UTF-8 paths.
|
||||
- **Fix**: Replace with `path.to_str().ok_or_else(|| anyhow!("path is not valid UTF-8: {}", path.display()))?`.
|
||||
|
||||
### #17: Portal shutdown swallows encoder errors
|
||||
|
||||
- **Location**: `src/state_portal.rs:198-209`, `src/main.rs:227`
|
||||
- **Description**: `StatePortal::shutdown()` catches and logs flush errors but returns `()`. `run_portal_pipewire` always returns `Ok(())` even if trailer writing failed.
|
||||
- **Fix**: Change `shutdown` to return `Result<()>`. Propagate in `run_portal_pipewire`.
|
||||
|
||||
### #18: WLR shutdown discards buffered FPS-limiter frame
|
||||
|
||||
- **Location**: `src/main.rs:167`, `src/state.rs:546`
|
||||
- **Description**: `state.fps_limit.flush()` return value is ignored. The limiter stores `S::Frame::default()` (not real frame data), so no actual frame is lost, but the design is misleading.
|
||||
- **Fix**: Use timestamp-only limiter (see #9). If keeping FpsLimit, log the discarded flush value.
|
||||
|
||||
### #19: StatePortal drops first PipeWire frame after format negotiation
|
||||
|
||||
- **Location**: `src/state_portal.rs:56-82`
|
||||
- **Description**: First `PwEvent::Frame` is used only for encoder creation, then `drop(frame)`. The first actual image is not encoded, creating a startup gap and interacting with FPS limiter issue #9.
|
||||
- **Fix**: After encoder creation, pass the same frame to `handle_pw_frame(frame)` instead of dropping. Or set FPS limiter baseline correctly.
|
||||
|
||||
### #20: CapPortal::Drop can block indefinitely on join
|
||||
|
||||
- **Location**: `src/cap_portal.rs:137-144`
|
||||
- **Description**: `Drop` sends `PwCmd::Shutdown` then unconditionally `join()`s. If the command channel send fails or `pw_main_loop_quit` doesn't wake the loop, `join` blocks forever.
|
||||
- **Fix**: Replace helper thread with PW-loop-integrated wakeup. If keeping thread design, add a bounded shutdown timeout.
|
||||
|
||||
### #21: pw::deinit() called from worker thread with process-wide implications
|
||||
|
||||
- **Location**: `src/cap_portal.rs:155, 361`
|
||||
- **Description**: `pw::init()`/`pw::deinit()` are process-global. Deinit inside a worker thread is fragile if future code adds another PipeWire user.
|
||||
- **Fix**: Treat PW init as process-level lifecycle. Init once at startup, deinit once at shutdown, or omit explicit deinit.
|
||||
|
||||
### #22: PipeWire callback assumes first data item contains complete DMA-BUF image
|
||||
|
||||
- **Location**: `src/cap_portal.rs:260`
|
||||
- **Description**: Only reads `datas[0]`. Multi-plane formats (NV12 etc.) would produce corrupted frames.
|
||||
- **Fix**: Validate that negotiated DRM format is single-plane RGB/XRGB before emitting frames.
|
||||
|
||||
### #24: CapPortal stores Tokio runtime but not portal session objects
|
||||
|
||||
- **Location**: `src/cap_portal.rs:31`
|
||||
- **Description**: `rt: Runtime` is stored but not used after `setup_portal`. The runtime does not keep `ashpd` proxy/session alive.
|
||||
- **Fix**: Remove `rt` if not needed, or store actual session/proxy handles alongside it.
|
||||
|
||||
### #25: poll_and_encode collapses all channel errors into "no event"
|
||||
|
||||
- **Location**: `src/state_portal.rs:49-53`
|
||||
- **Description**: `try_recv` errors are all mapped to `Ok(false)`. A disconnected channel (PipeWire thread crashed) is indistinguishable from an empty channel.
|
||||
- **Fix**: Match on `RecvError` variant. On `Disconnected`, set `self.errored = true`.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Severity | Count |
|
||||
|----------|-------|
|
||||
| Critical | 1 |
|
||||
| High | 4 |
|
||||
| Medium | 5 |
|
||||
| Low-Medium | 5 |
|
||||
| Low | 10 |
|
||||
| **Total** | **25** |
|
||||
|
||||
### Recommended Fix Priority
|
||||
|
||||
1. **#1** (Critical UAF) — PipeWire mainloop shutdown race
|
||||
2. **#4** (High leak) — AVDRMFrameDescriptor leak on encode error
|
||||
3. **#2** (High blocking) — PipeWire process callback blocking
|
||||
4. **#5** (High validation) — Zero-format encoder init
|
||||
5. **#3** (High PTS) — Portal PTS unit mismatch
|
||||
Reference in New Issue
Block a user