Asset Preselection¶
Overview¶
The asset preselection module provides a deterministic, factor-based approach to reducing the investment universe size before portfolio optimization. This is particularly useful for:
- Large universes (100+ assets) where optimization becomes computationally expensive
- Factor-tilted strategies that want to emphasize momentum or low-volatility characteristics
- Multi-stage portfolio construction workflows
Key Feature: All preselection logic uses only data available up to the rebalance date, ensuring no lookahead bias.
Supported Methods¶
1. Momentum¶
Selects assets with the highest cumulative returns over a lookback period.
Formula: Cumulative return = (1+r₁) × (1+r₂) × ... × (1+rₙ) - 1
Parameters:
lookback: Number of periods to look back (default: 252 days = 1 year)skip: Number of most recent periods to skip (default: 1)- Common practice: skip most recent day to avoid short-term reversals
- Example: with
lookback=252, skip=1, uses 12-month return excluding last day (aka "12-1 momentum")
Example:
from portfolio_management.portfolio import Preselection, PreselectionConfig, PreselectionMethod
config = PreselectionConfig(
method=PreselectionMethod.MOMENTUM,
top_k=30,
lookback=252, # 1 year
skip=1, # Skip most recent day
)
preselection = Preselection(config)
selected_assets = preselection.select_assets(returns, rebalance_date)
2. Low-Volatility¶
Selects assets with the lowest realized volatility over a lookback period.
Formula: Inverse volatility = 1 / std(returns)
Higher scores indicate lower volatility (more attractive for defensive strategies).
Parameters:
lookback: Number of periods for volatility calculation (default: 252 days)
Example:
config = PreselectionConfig(
method=PreselectionMethod.LOW_VOL,
top_k=30,
lookback=252,
)
preselection = Preselection(config)
selected_assets = preselection.select_assets(returns, rebalance_date)
3. Combined¶
Combines momentum and low-volatility factors using weighted Z-scores.
Process:
- Compute momentum scores for all assets
- Compute low-volatility scores for all assets
- Standardize each factor to Z-scores (mean=0, std=1)
- Combine:
combined_score = w₁ × momentum_z + w₂ × low_vol_z - Select top-K assets by combined score
Parameters:
momentum_weight: Weight for momentum factor (default: 0.5)low_vol_weight: Weight for low-volatility factor (default: 0.5)- Weights must sum to 1.0
Example:
config = PreselectionConfig(
method=PreselectionMethod.COMBINED,
top_k=30,
lookback=252,
momentum_weight=0.6, # 60% momentum
low_vol_weight=0.4, # 40% low-vol
)
preselection = Preselection(config)
selected_assets = preselection.select_assets(returns, rebalance_date)
Deterministic Tie-Breaking¶
When multiple assets have identical scores at the cutoff point, ties are broken alphabetically by asset symbol. This ensures:
- Reproducible results across runs
- Consistent behavior in backtests
- No random selection bias
Example: If selecting top-5 assets and ranks 4-7 all have score 0.15:
- Asset D (score 0.20)
- Asset A (score 0.18)
- Asset B (score 0.16)
- Asset C (score 0.15) ← Selected (first alphabetically)
- Asset E (score 0.15) ← Not selected
- Asset F (score 0.15) ← Not selected
- Asset G (score 0.14)
No Lookahead Guarantee¶
The preselection engine strictly uses only data available up to (but not including) the rebalance date:
# Correct: Uses only data before rebalance date
rebalance_date = date(2023, 6, 1)
selected = preselection.select_assets(returns, rebalance_date)
# Uses returns from start through 2023-05-31
# Also correct: Without date, uses all available data
selected = preselection.select_assets(returns)
This prevents lookahead bias in backtests.
CLI Usage¶
Basic Command¶
Run a backtest with momentum preselection selecting top 30 assets:
python scripts/run_backtest.py equal_weight \
--universe config/universes.yaml:core_global \
--start-date 2020-01-01 \
--end-date 2023-12-31 \
--preselect-method momentum \
--preselect-top-k 30
Advanced Examples¶
Example 1: Combined Factor with Custom Weights¶
Run mean-variance optimization with 60% momentum, 40% low-volatility:
python scripts/run_backtest.py mean_variance \
--universe config/universes.yaml:satellite_factor \
--start-date 2020-01-01 \
--end-date 2023-12-31 \
--preselect-method combined \
--preselect-top-k 30 \
--preselect-lookback 252 \
--preselect-skip 1 \
--preselect-momentum-weight 0.6 \
--preselect-low-vol-weight 0.4 \
--rebalance-frequency monthly
Expected Output: Backtest results showing ~30 assets selected per month, with performance metrics.
Example 2: Defensive Low-Volatility Strategy¶
Select top 25 lowest-volatility assets quarterly:
python scripts/run_backtest.py risk_parity \
--universe config/universes.yaml:core_global \
--start-date 2018-01-01 \
--end-date 2023-12-31 \
--preselect-method low_vol \
--preselect-top-k 25 \
--preselect-lookback 252 \
--rebalance-frequency quarterly
Expected Output: Lower volatility portfolio with more stable asset selection.
Example 3: Aggressive Momentum with Monthly Skip¶
Classic "12-1 momentum" strategy:
python scripts/run_backtest.py equal_weight \
--universe config/universes.yaml:satellite_factor \
--start-date 2015-01-01 \
--end-date 2023-12-31 \
--preselect-method momentum \
--preselect-top-k 40 \
--preselect-lookback 252 \
--preselect-skip 21 \
--rebalance-frequency monthly
Expected Output: Higher turnover, momentum-tilted portfolio excluding most recent month.
Example 4: Large Universe Reduction¶
Reduce 1000-asset universe to 50 for optimization:
python scripts/run_backtest.py mean_variance \
--universe config/universes.yaml:long_history_1000 \
--start-date 2010-01-01 \
--end-date 2023-12-31 \
--preselect-method combined \
--preselect-top-k 50 \
--preselect-lookback 252 \
--preselect-momentum-weight 0.5 \
--preselect-low-vol-weight 0.5 \
--rebalance-frequency monthly \
--verbose
Expected Output: Dramatic speedup (hours → minutes), comparable performance to full universe.
CLI Parameters Reference¶
| Parameter | Type | Description | Default | Valid Values |
|---|---|---|---|---|
--preselect-method |
str | Factor method | None | momentum, low_vol, combined |
--preselect-top-k |
int | Number of assets to select | None | 1-N (N=universe size) |
--preselect-lookback |
int | Lookback period (days) | 252 | 30-1260 |
--preselect-skip |
int | Skip recent N days | 1 | 0-lookback |
--preselect-momentum-weight |
float | Momentum weight (combined) | 0.5 | 0.0-1.0 |
--preselect-low-vol-weight |
float | Low-vol weight (combined) | 0.5 | 0.0-1.0 |
Notes:
- If
--preselect-methodis not specified, preselection is disabled - Weights must sum to 1.0 when using
combinedmethod --verboseflag provides factor score statistics in output
Universe Configuration¶
Preselection can be configured directly in universe YAML files:
universes:
satellite_factor:
description: "Factor sleeve with momentum/low-vol preselection"
filter_criteria:
# ... asset selection criteria ...
preselection:
method: "combined" # momentum, low_vol, or combined
top_k: 25 # Select top 25 assets
lookback: 252 # 1 year lookback
skip: 1 # Skip most recent day
momentum_weight: 0.5 # 50% momentum
low_vol_weight: 0.5 # 50% low-volatility
min_periods: 60 # Minimum data required
Loading from Universe¶
When using manage_universes.py or loading universes programmatically, preselection is automatically configured:
from portfolio_management.assets.universes import UniverseConfigLoader
from portfolio_management.portfolio import create_preselection_from_dict
# Load universe
config_loader = UniverseConfigLoader()
universes = config_loader.load_config(Path("config/universes.yaml"))
universe_def = universes["satellite_factor"]
# Create preselection if configured
if universe_def.preselection:
preselection = create_preselection_from_dict(universe_def.preselection)
Integration with Backtesting¶
Preselection integrates seamlessly with the backtesting engine:
from portfolio_management.backtesting import BacktestEngine, BacktestConfig
from portfolio_management.portfolio import (
EqualWeightStrategy,
Preselection,
PreselectionConfig,
PreselectionMethod,
)
# Configure preselection
preselection_config = PreselectionConfig(
method=PreselectionMethod.MOMENTUM,
top_k=30,
lookback=252,
)
preselection = Preselection(preselection_config)
# Run backtest with preselection
engine = BacktestEngine(
config=backtest_config,
strategy=EqualWeightStrategy(),
prices=prices,
returns=returns,
preselection=preselection, # Add preselection here
)
equity_curve, metrics, events = engine.run()
The engine automatically applies preselection at each rebalance:
- Compute factor scores using data up to rebalance date
- Select top-K assets
- Pass filtered returns to portfolio strategy
- Optimize portfolio on selected subset only
No changes to portfolio strategies are required - preselection is applied transparently before strategy execution.
Use Cases¶
1. Large Universe Optimization¶
Problem: Optimizing 500+ assets is computationally expensive.
Solution: Use preselection to reduce to manageable size (e.g., 50 assets).
2. Momentum-Tilted Portfolio¶
Problem: Want to emphasize recent winners without explicit momentum overlay.
Solution: Preselect high-momentum assets, then optimize weights.
config = PreselectionConfig(
method=PreselectionMethod.MOMENTUM,
top_k=30,
lookback=252,
skip=21, # Skip most recent month
)
3. Low-Volatility Strategy¶
Problem: Want defensive portfolio with minimal drawdowns.
Solution: Preselect low-volatility assets, then apply equal-weight or risk-parity.
4. Balanced Factor Exposure¶
Problem: Want both momentum and stability.
Solution: Use combined method with equal weights.
config = PreselectionConfig(
method=PreselectionMethod.COMBINED,
top_k=30,
lookback=252,
momentum_weight=0.5,
low_vol_weight=0.5,
)
Performance Considerations¶
Computational Complexity¶
Factor Computation: All factors exhibit linear O(N × L) complexity:
- Momentum: O(N × L) - cumulative product over lookback
- Low-volatility: O(N × L) - standard deviation computation
- Combined: O(N × L) - both factors computed, then combined with Z-scores
Typical Timings (from benchmarks on GitHub Actions runner):
| Universe Size | Lookback | Momentum | Low-Vol | Combined | Memory |
|---|---|---|---|---|---|
| 100 assets | 252 days | 2.3ms | 2.5ms | 4.8ms | 2MB |
| 500 assets | 252 days | 11.5ms | 12.8ms | 24.3ms | 10MB |
| 1000 assets | 252 days | 24.6ms | 27.1ms | 51.7ms | 20MB |
| 5000 assets | 252 days | 125.8ms | 138.5ms | 264.3ms | 100MB |
Key Finding: Preselection adds \<0.1 second per rebalance for typical universes (500-1000 assets).
Lookback Impact: Minimal performance variance (\<10%) across lookback periods 30-504 days.
Memory Usage¶
Preselection operates on returns DataFrame directly without creating large intermediate structures:
- Formula: N assets × L periods × 8 bytes (float64)
- Examples:
- 500 assets × 252 days ≈ 1 MB
- 1000 assets × 252 days ≈ 2 MB
- 5000 assets × 252 days ≈ 10 MB
Memory Efficiency: Peak usage \<200MB for 5000 assets (validated in profiling).
Optimization Impact¶
Reducing universe from N to K assets dramatically improves portfolio optimization:
Mean-Variance Optimization: O(N³) → O(K³)
- 500 → 50 assets: 1000× speedup
- 1000 → 100 assets: 1000× speedup
- 5000 → 500 assets: 1000× speedup
Risk Parity: O(N²) → O(K²)
- 500 → 50 assets: 100× speedup
- 1000 → 100 assets: 100× speedup
Real-World Impact: For large universes (1000+ assets), preselection can reduce backtest time from hours to minutes.
Scalability¶
Based on comprehensive profiling (benchmarks/benchmark_preselection.py):
- ✅ Linear scaling: Confirmed O(N) complexity through N=5000
- ✅ No bottlenecks: Dominant cost is factor computation (70-80% of time)
- ✅ Multiple rebalances: 24 monthly rebalances on 1000 assets \<2 seconds total
- ✅ Stable performance: No performance degradation with repeated calls
Recommendation: Safe to use with universes up to 5000 assets without caching.
Edge Cases¶
Insufficient Data¶
If an asset has fewer than min_periods returns:
- Factor score is set to NaN
- Asset is excluded from selection
All NaN Scores¶
If all assets have insufficient data:
- Returns empty list
- Backtest will fail (no assets to invest in)
Top-K Exceeds Universe Size¶
If top_k > number of valid assets:
- Returns all valid assets
- No error raised
Tied Scores at Cutoff¶
Multiple assets with identical scores at position K:
- Ties broken alphabetically by symbol
- First K (alphabetically) selected
Tuning Guidance¶
Lookback Period Selection¶
Rule of Thumb: Align lookback with rebalance frequency for stability.
| Rebalance Frequency | Recommended Lookback | Rationale |
|---|---|---|
| Weekly | 63 days (3 months) | Captures short-term trends |
| Monthly | 252 days (1 year) | Standard annual momentum |
| Quarterly | 504 days (2 years) | Long-term factor stability |
| Annual | 756 days (3 years) | Very stable, less responsive |
Trade-offs:
- Shorter lookback (60-126 days): More responsive, higher turnover, noisier signals
- Longer lookback (252-504 days): More stable, lower turnover, slower adaptation
- Impact on performance: Lookback has minimal computational impact (\<10% variance)
Top-K Selection Guidelines¶
Universe Size vs. Top-K Recommendations:
| Universe Size | Conservative K | Moderate K | Aggressive K | Impact |
|---|---|---|---|---|
| 100-200 | 15-25 | 25-40 | 40-60 | High concentration risk if too low |
| 200-500 | 20-35 | 35-60 | 60-100 | Balance optimization benefit vs. diversification |
| 500-1000 | 30-50 | 50-100 | 100-200 | Optimization speedup: O(N³) → O(K³) |
| 1000+ | 50-75 | 75-150 | 150-300 | Maximum computational savings |
Considerations:
- Too small (K\<20): High concentration risk, few diversification benefits
- Too large (K>100 for small universe): Loses preselection benefit, minimal speedup
- Sweet spot: K ≈ 10-30% of universe size for most strategies
Skip Period Tuning¶
Common Skip Configurations:
| Skip Days | Use Case | Evidence |
|---|---|---|
| 0 | No skip | Use most recent data, may include noise |
| 1 | Standard (default) | Skip last day to avoid short-term reversals |
| 5 | Weekly skip | Skip last week for cleaner signals |
| 21 | Monthly skip | Classic "12-1 momentum" (12 months excluding last month) |
Research-backed recommendation: Skip 1-21 days for momentum strategies to avoid short-term mean reversion.
Combined Factor Weights¶
Common Weight Configurations:
| Momentum Weight | Low-Vol Weight | Strategy Type | Characteristics |
|---|---|---|---|
| 0.5 | 0.5 | Balanced | Equal factor exposure |
| 0.6-0.7 | 0.3-0.4 | Growth-oriented | Emphasis on momentum |
| 0.3-0.4 | 0.6-0.7 | Defensive | Emphasis on low-volatility |
| 0.8+ | 0.2- | Aggressive momentum | Maximum return chase |
| 0.2- | 0.8+ | Maximum defense | Minimal drawdowns |
Tuning process:
- Start with equal weights (0.5/0.5)
- Run backtest to measure turnover and performance
- Adjust weights based on risk tolerance
- Validate with out-of-sample data
Validation Best Practices¶
-
Backtest First:
-
Run with and without preselection
- Compare performance metrics (Sharpe ratio, turnover, max drawdown)
-
Measure computational time savings
-
Monitor Turnover:
-
Calculate asset turnover at each rebalance
- High turnover (>50% per rebalance) may indicate:
- Lookback too short
- Top-K too small
- Factor signals too noisy
-
Consider adding membership policy for turnover control
-
Performance Impact:
-
Expected computational savings: 10-100x for large universes
- Memory usage: Minimal (factor scores \<1MB per 1000 assets)
-
Time per rebalance: \<0.1s for 1000 assets with typical lookback
-
Robustness Checks:
-
Test with different market conditions (bull, bear, sideways)
- Verify no lookahead bias (use rebalance_date parameter)
- Check factor score distributions for extremes/outliers
Limitations¶
-
Factor Timing: Preselection assumes factors persist across rebalance periods. May not work well with short rebalance frequencies.
-
Transaction Costs: Preselection adds turnover (assets enter/exit selected set). Factor this into cost model.
-
Survivorship Bias: Preselection on recent performance can amplify survivorship bias if dead assets aren't in the universe.
-
Strategy Compatibility: Works best with:
-
Equal-weight (no optimization needed)
- Risk parity (scales well to smaller universes)
- Mean-variance (benefits from universe reduction)
Testing¶
Comprehensive test suite covers:
- Factor computation accuracy
- Deterministic tie-breaking
- No-lookahead validation
- Edge cases (NaN, insufficient data, ties)
- Integration with strategies
Run tests:
References¶
Academic Research¶
-
Momentum: Jegadeesh & Titman (1993) - "Returns to Buying Winners and Selling Losers"
-
Foundational momentum research showing 12-month returns predict future performance
-
Skip period (1 month) important to avoid short-term reversals
-
Low-Volatility: Ang, Hodrick, Xing & Zhang (2006) - "The Cross-Section of Volatility and Expected Returns"
-
Documents low-volatility anomaly (defensive assets outperform high-volatility)
-
Theory: volatility-averse investors drive up prices of stable assets
-
Combined Factors: Asness, Moskowitz & Pedersen (2013) - "Value and Momentum Everywhere"
-
Shows momentum and value work together across asset classes
- Factor combination improves risk-adjusted returns
Performance Documentation¶
-
Benchmarks: See
docs/performance/preselection_profiling.md -
Detailed profiling results
- Scalability analysis (100-5000 assets)
- Memory usage characterization
-
Optimization recommendations
-
Integration Tests:
tests/portfolio/test_preselection.py -
Factor computation accuracy tests
- Deterministic behavior validation
-
Edge case handling
-
Edge Cases:
tests/integration/test_preselection_edge_cases.py -
Insufficient data scenarios
- Tie-breaking validation
- Lookback boundary conditions
Related Documentation¶
- Portfolio Construction - Strategy implementation details
- Backtesting - Full backtest workflow with preselection
- Membership Policy - Turnover control to complement preselection
- Universe Management - Universe configuration with preselection blocks
- Fast I/O - Optional speedups for loading large datasets
- Statistics Caching - Cache preselection factor scores