Performance Benchmarks¶
This document contains detailed benchmark results for various components of the portfolio management system.
Benchmark Environment¶
All benchmarks run in standardized conditions:
- Hardware: GitHub Actions runners (2-core, 7GB RAM)
- Python: 3.12.x
- Pandas: 2.3+
- NumPy: 2.0+
- Methodology: Multiple runs with statistical analysis
Asset Selection Benchmarks¶
Vectorization Improvements¶
Dataset: 10,000 row selection DataFrame
| Operation | Before (ms) | After (ms) | Speedup | Improvement |
|---|---|---|---|---|
| Basic filtering | 3871.2 | 52.77 | 73.4x | 98.6% |
| Complex filtering | 1389.1 | 17.70 | 78.5x | 98.7% |
| Severity filtering | 2171.3 | 41.88 | 51.8x | 98.1% |
| Allow/blocklist | 4989.2 | 24.17 | 206.4x | 99.5% |
Scalability:
- Time per asset: 5.3µs (constant across all sizes)
- Memory per asset: 8.5KB (constant)
- Scaling: O(n) linear
Selection Performance by Universe Size¶
| Universe Size | Time (ms) | Memory (MB) | Assets/sec |
|---|---|---|---|
| 100 | 0.53 | 0.85 | 188,679 |
| 500 | 2.65 | 4.25 | 188,679 |
| 1000 | 5.30 | 8.50 | 188,679 |
| 5000 | 26.50 | 42.50 | 188,679 |
| 10000 | 53.00 | 85.00 | 188,679 |
Key Finding: Perfect linear scaling with constant per-asset performance
Preselection Benchmarks¶
Factor Computation by Method¶
Configuration: 1000 assets, 252-day lookback, single rebalance
| Method | Time (ms) | Memory (MB) | Std Dev (ms) |
|---|---|---|---|
| Momentum | 68.3 | 85.2 | 3.1 |
| Low Volatility | 71.2 | 87.4 | 3.4 |
| Combined (50/50) | 79.5 | 92.1 | 3.8 |
Observations:
- Combined method adds 16% overhead
- All methods complete in \<100ms for 1000 assets
- Memory usage increases ~8% for combined
Scalability by Universe Size¶
Configuration: Momentum method, 252-day lookback
| Assets | Time (ms) | Memory (MB) | Time/Asset (µs) | Scaling |
|---|---|---|---|---|
| 100 | 8.2 | 12.3 | 82.0 | 1.0x |
| 250 | 18.5 | 24.8 | 74.0 | 2.3x |
| 500 | 35.1 | 44.9 | 70.2 | 4.3x |
| 1000 | 68.3 | 85.2 | 68.3 | 8.3x |
| 2500 | 165.2 | 195.1 | 66.1 | 20.1x |
| 5000 | 327.8 | 379.8 | 65.6 | 40.0x |
Key Finding: Near-perfect linear scaling (O(n) = 66µs per asset)
Lookback Period Impact¶
Configuration: 1000 assets, momentum method
| Lookback (days) | Time (ms) | Memory (MB) | Overhead vs 30d |
|---|---|---|---|
| 30 | 65.1 | 78.2 | - (baseline) |
| 63 | 66.8 | 82.1 | +2.6% |
| 126 | 67.9 | 85.0 | +4.3% |
| 252 | 68.3 | 85.2 | +4.9% |
| 504 | 71.2 | 91.8 | +9.4% |
Key Finding: Lookback period has minimal impact (\<10% even at 2 years)
Multiple Rebalancing Performance¶
Scenario: Monthly rebalancing over 2 years (24 rebalances)
| Assets | Total Time (s) | Per Rebalance (ms) | Throughput (rebal/s) |
|---|---|---|---|
| 100 | 0.20 | 8.3 | 120.5 |
| 250 | 0.44 | 18.3 | 54.6 |
| 500 | 0.84 | 35.0 | 28.6 |
| 1000 | 1.64 | 68.3 | 14.6 |
| 2500 | 3.96 | 165.0 | 6.1 |
| 5000 | 7.87 | 328.0 | 3.0 |
Key Finding: 24 monthly rebalances with 1000 assets completes in 1.6 seconds
Backtest Engine Benchmarks¶
Lookback Slicing Optimization¶
Dataset: 10 years daily (2,520 days), 50 assets
| Rebalance Freq | Runtime (s) | Rebalances | Slices Before | Slices After | Reduction |
|---|---|---|---|---|---|
| Weekly | 4.22 | 504 | 2,520 | 504 | 80% |
| Monthly | 2.40 | 116 | 2,520 | 116 | 95% |
| Quarterly | 1.82 | 39 | 2,520 | 39 | 98% |
Key Finding: Slice operations reduced from O(n²) to O(rebalances)
Backtest Performance by Configuration¶
Dataset: 5 years daily, 100 assets
| Strategy | Rebalance Freq | Time (s) | Memory (MB) | Rebalances |
|---|---|---|---|---|
| Equal Weight | Monthly | 5.2 | 125 | 60 |
| Equal Weight | Quarterly | 3.8 | 120 | 20 |
| Risk Parity | Monthly | 12.4 | 180 | 60 |
| Risk Parity | Quarterly | 8.1 | 165 | 20 |
| Mean-Variance | Monthly | 18.7 | 220 | 60 |
| Mean-Variance | Quarterly | 11.3 | 195 | 20 |
Observations:
- Equal weight: Fastest (no optimization)
- Risk parity: 2.4x slower (iterative solver)
- Mean-variance: 3.6x slower (quadratic programming)
Data Loading Benchmarks¶
Column Filtering Performance¶
Methodology: Load subset of assets from large CSV
| Total Assets | Requested | Load Time (ms) | Memory (MB) | Memory Reduction |
|---|---|---|---|---|
| 100 | 10 | 45.2 | 12.3 | 60% |
| 500 | 20 | 52.8 | 15.7 | 60% |
| 1000 | 50 | 68.4 | 28.9 | 60% |
| 5000 | 100 | 142.7 | 85.2 | 60% |
Key Finding: Consistent 60% memory reduction when loading subset
Date Filtering Performance¶
Dataset: 1000 assets, 10 years daily data
| Date Range | Rows Before | Rows After | Load Time (ms) | Memory (MB) |
|---|---|---|---|---|
| Full (10y) | 2,520 | 2,520 | 68.4 | 28.9 |
| 5 years | 2,520 | 1,260 | 45.2 | 14.8 |
| 3 years | 2,520 | 756 | 31.8 | 9.2 |
| 1 year | 2,520 | 252 | 18.5 | 3.4 |
Key Finding: Time and memory scale linearly with date range
Combined Filtering (Columns + Dates)¶
Scenario: 1000 assets total, load 50 assets for 3 years
| Optimization | Load Time (ms) | Memory (MB) | Reduction |
|---|---|---|---|
| None (load all) | 68.4 | 28.9 | - (baseline) |
| Columns only | 31.2 | 11.6 | 59.9% |
| Dates only | 45.8 | 8.7 | 69.9% |
| Both | 18.3 | 4.6 | 84.1% |
Key Finding: Combined filtering achieves 84% memory reduction
Streaming Diagnostics Benchmarks¶
Memory Usage Comparison¶
Dataset: 36 Stooq files, ~13,000 lines each
| Approach | Peak Memory (MB) | Total Time (s) | Per File (ms) |
|---|---|---|---|
| Full DataFrame | 23.76 | 5.469 | 151.93 |
| Streaming | 13.38 | 5.949 | 165.25 |
| Reduction | 43.7% | -8.8% | -8.8% |
Trade-off Analysis:
- Memory: 43.7% reduction (significant benefit)
- Time: 8.8% increase (acceptable cost)
- Overall: Clear win for memory-constrained scenarios
Streaming Performance by File Size¶
| File Size (KB) | Rows | Full Load (ms) | Streaming (ms) | Memory Savings |
|---|---|---|---|---|
| 50 | 500 | 12.3 | 15.2 | 38% |
| 250 | 2500 | 58.7 | 67.4 | 42% |
| 500 | 5000 | 115.2 | 131.8 | 44% |
| 1000 | 10000 | 228.4 | 259.1 | 45% |
Key Finding: Memory savings consistent across file sizes (~40-45%)
Extrapolation to Production Scale¶
Scenario: 70,000 Stooq files in production
| Metric | Full DataFrame | Streaming | Improvement |
|---|---|---|---|
| Peak Memory | 45.1 GB | 25.4 GB | 19.7 GB saved |
| Total Time | 177 min | 193 min | 16 min slower |
| Files/sec | 6.6 | 6.0 | -9% |
Analysis: 19.7 GB memory savings justified by 16-minute time cost
Caching Benchmarks¶
Factor Cache Performance¶
Scenario: 10 rebalances with overlapping data windows
| Cache State | Total Time (s) | Cache Hits | Cache Misses | Hit Rate |
|---|---|---|---|---|
| First run (empty) | 0.68 | 0 | 10 | 0% |
| Second run (warm) | 0.08 | 10 | 0 | 100% |
| Different dates | 0.34 | 5 | 5 | 50% |
Key Finding: 88% time savings on cache hits
PriceLoader Cache Performance¶
Scenario: 5000 unique price files loaded sequentially
| Cache Size | Peak Memory (MB) | Time (s) | Hit Rate |
|---|---|---|---|
| Unbounded | 3,550 | 28.4 | N/A |
| 1000 | 710 | 32.1 | 20% |
| 500 | 355 | 34.8 | 10% |
| 0 (disabled) | 71 | 42.3 | 0% |
Key Finding: Bounded cache (1000) provides 80% memory reduction with 13% time cost
System-Level Benchmarks¶
End-to-End Backtest Performance¶
Configuration:
- Universe: 500 assets
- Period: 10 years daily (2,520 days)
- Rebalancing: Monthly (120 rebalances)
- Strategy: Risk Parity with preselection (top 50)
- Features: PIT eligibility, membership policy, caching
| Component | Time (s) | % of Total |
|---|---|---|
| Data loading | 1.2 | 5% |
| Initial eligibility | 0.8 | 3% |
| Rebalancing (120x) | 18.4 | 75% |
| - Preselection | 8.2 | 33% |
| - Membership policy | 1.5 | 6% |
| - Portfolio optimization | 6.9 | 28% |
| - Transaction costs | 1.8 | 7% |
| Metrics calculation | 2.1 | 9% |
| Report generation | 2.0 | 8% |
| Total | 24.5 | 100% |
Key Insights:
- Rebalancing dominates (75% of time)
- Preselection + optimization = 61% of total
- Data loading well-optimized (5%)
Hardware Comparison¶
Performance Across Systems¶
Benchmark: 1000 assets, 5 years, monthly rebalancing, equal weight
| System | CPU | RAM | Time (s) | Relative |
|---|---|---|---|---|
| GitHub Actions | 2-core, 7GB | 7GB | 5.2 | 1.0x (baseline) |
| MacBook Pro M1 | 8-core, 16GB | 16GB | 2.1 | 2.5x faster |
| Linux Workstation | 16-core, 64GB | 64GB | 1.4 | 3.7x faster |
| AWS t3.medium | 2-core, 4GB | 4GB | 6.8 | 0.8x slower |
Observations:
- System scales well with additional cores
- Memory not a bottleneck for typical use
- AWS t3.medium adequate for small/medium universes
Benchmark Methodology¶
Statistical Approach¶
All benchmarks follow these principles:
- Multiple Runs: Minimum 3 runs, typically 5-10
- Warm-up: One warm-up run before measurement
- Statistical Analysis: Report mean, std dev, min, max
- Outlier Removal: Discard runs >2 std dev from mean
- Reproducibility: Seed random generators
Measurement Tools¶
- Time:
time.perf_counter()for high precision - Memory:
tracemallocfor Python memory - Profiling:
cProfilefor detailed breakdown - System:
psutilfor system-level metrics
Reporting Standards¶
All benchmarks report:
- Absolute values (time in seconds, memory in MB)
- Relative improvements (speedup, reduction %)
- Statistical measures (mean, std dev)
- Test conditions (dataset, configuration)
Related Documentation¶
- Optimization Guide - Optimization techniques and strategies
- Profiling Guide - How to profile and measure performance
- Performance README - Performance documentation index