🔴 [H4] 大文件生成未使用 BufWriter 导致 benchmark 数据准备极慢 #35

Closed
opened 2026-06-05 11:53:22 +08:00 by dailz · 1 comment
Owner

问题

generate_test_file 生成约 7400 万行,generate_growable_file 生成约 15 万行,但都直接对 File 执行循环 writeln!,没有使用 BufWriter

影响

大文件生成会产生大量小写 syscall,尤其 5GB 测试文件可能慢几个数量级,影响 benchmark 可用性。

位置

  • crates/bench/src/data_gen.rs:54
  • crates/bench/src/data_gen.rs:75

建议

std::io::BufWriter::with_capacity(64 * 1024, file) 包装输出文件,并在结束时 flush。

## 问题 `generate_test_file` 生成约 7400 万行,`generate_growable_file` 生成约 15 万行,但都直接对 `File` 执行循环 `writeln!`,没有使用 `BufWriter`。 ## 影响 大文件生成会产生大量小写 syscall,尤其 5GB 测试文件可能慢几个数量级,影响 benchmark 可用性。 ## 位置 - `crates/bench/src/data_gen.rs:54` - `crates/bench/src/data_gen.rs:75` ## 建议 用 `std::io::BufWriter::with_capacity(64 * 1024, file)` 包装输出文件,并在结束时 flush。
Author
Owner

Fixed in 6dd87d2.

Root cause: writeln! directly on raw File — each line (~70 bytes) triggered a separate write syscall.

Fix: Wrapped all three write-heavy functions with BufWriter::with_capacity(64KB):

  • generate_test_file (74M lines / ~5GB)
  • generate_growable_file (150K lines / ~10MB)
  • append_lines (caller-determined)

Added explicit flush()? + drop(file) before subsequent reads to ensure data visibility and propagate flush errors.

Fixed in 6dd87d2. Root cause: writeln! directly on raw File — each line (~70 bytes) triggered a separate write syscall. Fix: Wrapped all three write-heavy functions with BufWriter::with_capacity(64KB): - generate_test_file (74M lines / ~5GB) - generate_growable_file (150K lines / ~10MB) - append_lines (caller-determined) Added explicit flush()? + drop(file) before subsequent reads to ensure data visibility and propagate flush errors.
dailz closed this issue 2026-06-05 14:02:16 +08:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: dailz/logViewer#35