Skip to content

Performance Benchmarks

This document contains detailed benchmark results for various components of the portfolio management system.

Benchmark Environment

All benchmarks run in standardized conditions:

  • Hardware: GitHub Actions runners (2-core, 7GB RAM)
  • Python: 3.12.x
  • Pandas: 2.3+
  • NumPy: 2.0+
  • Methodology: Multiple runs with statistical analysis

Asset Selection Benchmarks

Vectorization Improvements

Dataset: 10,000 row selection DataFrame

Operation Before (ms) After (ms) Speedup Improvement
Basic filtering 3871.2 52.77 73.4x 98.6%
Complex filtering 1389.1 17.70 78.5x 98.7%
Severity filtering 2171.3 41.88 51.8x 98.1%
Allow/blocklist 4989.2 24.17 206.4x 99.5%

Scalability:

  • Time per asset: 5.3µs (constant across all sizes)
  • Memory per asset: 8.5KB (constant)
  • Scaling: O(n) linear

Selection Performance by Universe Size

Universe Size Time (ms) Memory (MB) Assets/sec
100 0.53 0.85 188,679
500 2.65 4.25 188,679
1000 5.30 8.50 188,679
5000 26.50 42.50 188,679
10000 53.00 85.00 188,679

Key Finding: Perfect linear scaling with constant per-asset performance

Preselection Benchmarks

Factor Computation by Method

Configuration: 1000 assets, 252-day lookback, single rebalance

Method Time (ms) Memory (MB) Std Dev (ms)
Momentum 68.3 85.2 3.1
Low Volatility 71.2 87.4 3.4
Combined (50/50) 79.5 92.1 3.8

Observations:

  • Combined method adds 16% overhead
  • All methods complete in \<100ms for 1000 assets
  • Memory usage increases ~8% for combined

Scalability by Universe Size

Configuration: Momentum method, 252-day lookback

Assets Time (ms) Memory (MB) Time/Asset (µs) Scaling
100 8.2 12.3 82.0 1.0x
250 18.5 24.8 74.0 2.3x
500 35.1 44.9 70.2 4.3x
1000 68.3 85.2 68.3 8.3x
2500 165.2 195.1 66.1 20.1x
5000 327.8 379.8 65.6 40.0x

Key Finding: Near-perfect linear scaling (O(n) = 66µs per asset)

Lookback Period Impact

Configuration: 1000 assets, momentum method

Lookback (days) Time (ms) Memory (MB) Overhead vs 30d
30 65.1 78.2 - (baseline)
63 66.8 82.1 +2.6%
126 67.9 85.0 +4.3%
252 68.3 85.2 +4.9%
504 71.2 91.8 +9.4%

Key Finding: Lookback period has minimal impact (\<10% even at 2 years)

Multiple Rebalancing Performance

Scenario: Monthly rebalancing over 2 years (24 rebalances)

Assets Total Time (s) Per Rebalance (ms) Throughput (rebal/s)
100 0.20 8.3 120.5
250 0.44 18.3 54.6
500 0.84 35.0 28.6
1000 1.64 68.3 14.6
2500 3.96 165.0 6.1
5000 7.87 328.0 3.0

Key Finding: 24 monthly rebalances with 1000 assets completes in 1.6 seconds

Backtest Engine Benchmarks

Lookback Slicing Optimization

Dataset: 10 years daily (2,520 days), 50 assets

Rebalance Freq Runtime (s) Rebalances Slices Before Slices After Reduction
Weekly 4.22 504 2,520 504 80%
Monthly 2.40 116 2,520 116 95%
Quarterly 1.82 39 2,520 39 98%

Key Finding: Slice operations reduced from O(n²) to O(rebalances)

Backtest Performance by Configuration

Dataset: 5 years daily, 100 assets

Strategy Rebalance Freq Time (s) Memory (MB) Rebalances
Equal Weight Monthly 5.2 125 60
Equal Weight Quarterly 3.8 120 20
Risk Parity Monthly 12.4 180 60
Risk Parity Quarterly 8.1 165 20
Mean-Variance Monthly 18.7 220 60
Mean-Variance Quarterly 11.3 195 20

Observations:

  • Equal weight: Fastest (no optimization)
  • Risk parity: 2.4x slower (iterative solver)
  • Mean-variance: 3.6x slower (quadratic programming)

Data Loading Benchmarks

Column Filtering Performance

Methodology: Load subset of assets from large CSV

Total Assets Requested Load Time (ms) Memory (MB) Memory Reduction
100 10 45.2 12.3 60%
500 20 52.8 15.7 60%
1000 50 68.4 28.9 60%
5000 100 142.7 85.2 60%

Key Finding: Consistent 60% memory reduction when loading subset

Date Filtering Performance

Dataset: 1000 assets, 10 years daily data

Date Range Rows Before Rows After Load Time (ms) Memory (MB)
Full (10y) 2,520 2,520 68.4 28.9
5 years 2,520 1,260 45.2 14.8
3 years 2,520 756 31.8 9.2
1 year 2,520 252 18.5 3.4

Key Finding: Time and memory scale linearly with date range

Combined Filtering (Columns + Dates)

Scenario: 1000 assets total, load 50 assets for 3 years

Optimization Load Time (ms) Memory (MB) Reduction
None (load all) 68.4 28.9 - (baseline)
Columns only 31.2 11.6 59.9%
Dates only 45.8 8.7 69.9%
Both 18.3 4.6 84.1%

Key Finding: Combined filtering achieves 84% memory reduction

Streaming Diagnostics Benchmarks

Memory Usage Comparison

Dataset: 36 Stooq files, ~13,000 lines each

Approach Peak Memory (MB) Total Time (s) Per File (ms)
Full DataFrame 23.76 5.469 151.93
Streaming 13.38 5.949 165.25
Reduction 43.7% -8.8% -8.8%

Trade-off Analysis:

  • Memory: 43.7% reduction (significant benefit)
  • Time: 8.8% increase (acceptable cost)
  • Overall: Clear win for memory-constrained scenarios

Streaming Performance by File Size

File Size (KB) Rows Full Load (ms) Streaming (ms) Memory Savings
50 500 12.3 15.2 38%
250 2500 58.7 67.4 42%
500 5000 115.2 131.8 44%
1000 10000 228.4 259.1 45%

Key Finding: Memory savings consistent across file sizes (~40-45%)

Extrapolation to Production Scale

Scenario: 70,000 Stooq files in production

Metric Full DataFrame Streaming Improvement
Peak Memory 45.1 GB 25.4 GB 19.7 GB saved
Total Time 177 min 193 min 16 min slower
Files/sec 6.6 6.0 -9%

Analysis: 19.7 GB memory savings justified by 16-minute time cost

Caching Benchmarks

Factor Cache Performance

Scenario: 10 rebalances with overlapping data windows

Cache State Total Time (s) Cache Hits Cache Misses Hit Rate
First run (empty) 0.68 0 10 0%
Second run (warm) 0.08 10 0 100%
Different dates 0.34 5 5 50%

Key Finding: 88% time savings on cache hits

PriceLoader Cache Performance

Scenario: 5000 unique price files loaded sequentially

Cache Size Peak Memory (MB) Time (s) Hit Rate
Unbounded 3,550 28.4 N/A
1000 710 32.1 20%
500 355 34.8 10%
0 (disabled) 71 42.3 0%

Key Finding: Bounded cache (1000) provides 80% memory reduction with 13% time cost

System-Level Benchmarks

End-to-End Backtest Performance

Configuration:

  • Universe: 500 assets
  • Period: 10 years daily (2,520 days)
  • Rebalancing: Monthly (120 rebalances)
  • Strategy: Risk Parity with preselection (top 50)
  • Features: PIT eligibility, membership policy, caching
Component Time (s) % of Total
Data loading 1.2 5%
Initial eligibility 0.8 3%
Rebalancing (120x) 18.4 75%
- Preselection 8.2 33%
- Membership policy 1.5 6%
- Portfolio optimization 6.9 28%
- Transaction costs 1.8 7%
Metrics calculation 2.1 9%
Report generation 2.0 8%
Total 24.5 100%

Key Insights:

  • Rebalancing dominates (75% of time)
  • Preselection + optimization = 61% of total
  • Data loading well-optimized (5%)

Hardware Comparison

Performance Across Systems

Benchmark: 1000 assets, 5 years, monthly rebalancing, equal weight

System CPU RAM Time (s) Relative
GitHub Actions 2-core, 7GB 7GB 5.2 1.0x (baseline)
MacBook Pro M1 8-core, 16GB 16GB 2.1 2.5x faster
Linux Workstation 16-core, 64GB 64GB 1.4 3.7x faster
AWS t3.medium 2-core, 4GB 4GB 6.8 0.8x slower

Observations:

  • System scales well with additional cores
  • Memory not a bottleneck for typical use
  • AWS t3.medium adequate for small/medium universes

Benchmark Methodology

Statistical Approach

All benchmarks follow these principles:

  1. Multiple Runs: Minimum 3 runs, typically 5-10
  2. Warm-up: One warm-up run before measurement
  3. Statistical Analysis: Report mean, std dev, min, max
  4. Outlier Removal: Discard runs >2 std dev from mean
  5. Reproducibility: Seed random generators

Measurement Tools

  • Time: time.perf_counter() for high precision
  • Memory: tracemalloc for Python memory
  • Profiling: cProfile for detailed breakdown
  • System: psutil for system-level metrics

Reporting Standards

All benchmarks report:

  • Absolute values (time in seconds, memory in MB)
  • Relative improvements (speedup, reduction %)
  • Statistical measures (mean, std dev)
  • Test conditions (dataset, configuration)