# Pickle Chunked Reading Benchmark This benchmark measures the performance impact of the chunked reading optimization in GH PR #119204 for the pickle module. ## What This Tests The PR adds chunked reading (1MB chunks) to prevent memory exhaustion when unpickling large objects: - **BINBYTES8** - Large bytes objects (protocol 4+) - **BINUNICODE8** - Large strings (protocol 4+) - **BYTEARRAY8** - Large bytearrays (protocol 5) - **FRAME** - Large frames - **LONG4** - Large integers - An antagonistic mode that tests using memory denial of service inducing malicious pickles. ## Quick Start ```bash # Run full benchmark suite (1MiB → 200MiB, takes several minutes) build/python Tools/picklebench/memory_dos_impact.py # Test just a few sizes (quick test: 1, 10, 50 MiB) build/python Tools/picklebench/memory_dos_impact.py --sizes 1 10 50 # Test smaller range for faster results build/python Tools/picklebench/memory_dos_impact.py --sizes 1 5 10 # Output as markdown for reports build/python Tools/picklebench/memory_dos_impact.py --format markdown > results.md # Test with protocol 4 instead of 5 build/python Tools/picklebench/memory_dos_impact.py --protocol 4 ``` **Note:** Sizes are specified in MiB. Use `--sizes 1 2 5` for 1MiB, 2MiB, 5MiB objects. ## Antagonistic Mode (DoS Protection Test) The `--antagonistic` flag tests **malicious pickles** that demonstrate the memory DoS protection: ```bash # Quick DoS protection test (claims 10, 50, 100 MB but provides 1KB) build/python Tools/picklebench/memory_dos_impact.py --antagonistic --sizes 10 50 100 # Full DoS test (default: 10, 50, 100, 500, 1000, 5000 MB claimed) build/python Tools/picklebench/memory_dos_impact.py --antagonistic ``` ### What This Tests Unlike normal benchmarks that test **legitimate pickles**, antagonistic mode tests: - **Truncated BINBYTES8**: Claims 100MB but provides only 1KB (will fail to unpickle) - **Truncated BINUNICODE8**: Same for strings - **Truncated BYTEARRAY8**: Same for bytearrays - **Sparse memo attacks**: PUT at index 1 billion (would allocate huge array before PR) **Key difference:** - **Normal mode**: Tests real data, shows ~5% time overhead - **Antagonistic mode**: Tests malicious data, shows ~99% memory savings ### Expected Results ``` 100MB Claimed (actual: 1KB) binbytes8_100MB_claim Peak memory: 1.00 MB (claimed: 100 MB, saved: 99.00 MB, 99.0%) Error: UnpicklingError ← Expected! Summary: Average claimed: 126.2 MB Average peak: 0.54 MB Average saved: 125.7 MB (99.6% reduction) Protection Status: ✓ Memory DoS attacks mitigated by chunked reading ``` **Before PR**: Would allocate full claimed size (100MB+), potentially crash **After PR**: Allocates 1MB chunks, fails fast with minimal memory This demonstrates the **security improvement** - protection against memory exhaustion attacks. ## Before/After Comparison The benchmark includes an automatic comparison feature that runs the same tests on both a baseline and current Python build. ### Option 1: Automatic Comparison (Recommended) Build both versions, then use `--baseline` to automatically compare: ```bash # Build the baseline (main branch without PR) git checkout main mkdir -p build-main cd build-main && ../configure && make -j $(nproc) && cd .. # Build the current version (with PR) git checkout unpickle-overallocate mkdir -p build cd build && ../configure && make -j $(nproc) && cd .. # Run automatic comparison (quick test with a few sizes) build/python Tools/picklebench/memory_dos_impact.py \ --baseline build-main/python \ --sizes 1 10 50 # Full comparison (all default sizes) build/python Tools/picklebench/memory_dos_impact.py \ --baseline build-main/python ``` The comparison output shows: - Side-by-side metrics (Current vs Baseline) - Percentage change for time and memory - Overall summary statistics ### Interpreting Comparison Results - **Time change**: Small positive % is expected (chunking adds overhead, typically 5-10%) - **Memory change**: Negative % is good (chunking saves memory, especially for large objects) - **Trade-off**: Slightly slower but much safer against memory exhaustion attacks ### Option 2: Manual Comparison Save results separately and compare manually: ```bash # Baseline results build-main/python Tools/picklebench/memory_dos_impact.py --format json > baseline.json # Current results build/python Tools/picklebench/memory_dos_impact.py --format json > current.json # Manual comparison diff -y <(jq '.' baseline.json) <(jq '.' current.json) ``` ## Understanding the Results ### Critical Sizes The default test suite includes: - **< 1MiB (999,000 bytes)**: No chunking, allocates full size upfront - **= 1MiB (1,048,576 bytes)**: Threshold, chunking just starts - **> 1MiB (1,048,577 bytes)**: Chunked reading engaged - **1, 2, 5, 10MiB**: Show scaling behavior with chunking - **20, 50, 100, 200MiB**: Stress test large object handling **Note:** The full suite may require more than 16GiB of RAM. ### Key Metrics - **Time (mean)**: Average unpickling time - should be similar before/after - **Time (stdev)**: Consistency - lower is better - **Peak Memory**: Maximum memory during unpickling - **expected to be LOWER after PR** - **Pickle Size**: Size of the serialized data on disk ### Test Types | Test | What It Stresses | |------|------------------| | `bytes_*` | BINBYTES8 opcode, raw binary data | | `string_ascii_*` | BINUNICODE8 with simple ASCII | | `string_utf8_*` | BINUNICODE8 with multibyte UTF-8 (€ chars) | | `bytearray_*` | BYTEARRAY8 opcode (protocol 5) | | `list_large_items_*` | Multiple chunked reads in sequence | | `dict_large_values_*` | Chunking in dict deserialization | | `nested_*` | Realistic mixed data structures | | `tuple_*` | Immutable structures | ## Expected Results ### Before PR (main branch) - Single large allocation per object - Risk of memory exhaustion with malicious pickles ### After PR (unpickle-overallocate branch) - Chunked allocation (1MB at a time) - **Slightly higher CPU time** (multiple allocations + resizing) - **Significantly lower peak memory** (no large pre-allocation) - Protection against DoS via memory exhaustion ## Advanced Usage ### Test Specific Sizes ```bash # Test only 5MiB and 10MiB objects build/python Tools/picklebench/memory_dos_impact.py --sizes 5 10 # Test large objects: 50, 100, 200 MiB build/python Tools/picklebench/memory_dos_impact.py --sizes 50 100 200 ``` ### More Iterations for Stable Timing ```bash # Run 10 iterations per test for better statistics build/python Tools/picklebench/memory_dos_impact.py --iterations 10 --sizes 1 10 ``` ### JSON Output for Analysis ```bash # Generate JSON for programmatic analysis build/python Tools/picklebench/memory_dos_impact.py --format json | python -m json.tool ``` ## Interpreting Memory Results The **peak memory** metric shows the maximum memory allocated during unpickling: - **Without chunking**: Allocates full size immediately - 10MB object → 10MB allocation upfront - **With chunking**: Allocates in 1MB chunks, grows geometrically - 10MB object → starts with 1MB, grows: 2MB, 4MB, 8MB (final: ~10MB total) - Peak is lower because allocation is incremental ## Typical Results On a system with the PR applied, you should see: ``` 1.00MiB Test Results bytes_1.00MiB: ~0.3ms, 1.00MiB peak (just at threshold) 2.00MiB Test Results bytes_2.00MiB: ~0.8ms, 2.00MiB peak (chunked: 1MiB → 2MiB) 10.00MiB Test Results bytes_10.00MiB: ~3-5ms, 10.00MiB peak (chunked: 1→2→4→8→10 MiB) ``` Time overhead is minimal (~10-20% for very large objects), but memory safety is significantly improved.