Advanced Portfolio Features Guide¶
Comprehensive Tutorial: Filtering, Factor Selection, and Advanced Backtesting
This guide demonstrates the sophisticated filtering and portfolio management capabilities available in this toolkit, using a real-world S&P 500 blue chip portfolio example.
Table of Contents¶
- Universe Filtering
- Multi-Factor Preselection
- Membership Policy (Turnover Control)
- Point-in-Time Eligibility
- Statistics Caching
- Complete Workflow Example
Universe Filtering¶
Quality Filters¶
Goal: Select only high-quality assets with sufficient history and clean data.
filter_criteria:
# Data quality - "ok" = clean, "warning" = minor issues, "error" = problematic
data_status: ["ok"]
# History requirements
min_history_days: 7300 # 20 years of calendar days
min_price_rows: 5040 # 20 years of trading days
# Market/geography filters
markets: ["NYSE", "NSQ"] # US exchanges only
currencies: ["USD"] # Dollar-denominated only
# Category filters (data source structure)
categories:
- "nyse stocks/1" # Tier 1 NYSE stocks (large caps)
- "nasdaq stocks/1" # Tier 1 NASDAQ stocks
What This Does:
- Eliminates penny stocks and small caps (via tier filtering)
- Ensures 20 years of history (survives 2000 dot-com, 2008 crisis, 2020 COVID)
- Only clean data (no gaps, missing dates, or quality issues)
- US-only (avoids currency risk)
Allow/Blocklists¶
Manual Control: Override automatic selection for specific tickers.
filter_criteria:
allowlist:
- AAPL # Force include Apple
- MSFT # Force include Microsoft
- GOOGL # Force include Alphabet
blocklist:
- GME # Exclude GameStop (too volatile)
- AMC # Exclude AMC (meme stock)
Use Cases:
- Allowlist: Force include stocks you know are high-quality
- Blocklist: Exclude stocks with known issues (bankruptcy risk, regulatory problems)
Classification Requirements¶
Goal: Filter by asset type after selection.
classification_requirements:
asset_class: ["equity"] # Stocks only (no bonds, commodities)
sub_class: ["growth", "value"] # Growth or value stocks
geography: ["north_america"] # US/Canada only
Asset Classes Available:
equity,bond,commodity,real_estate,cash,alternative
Sub-Classes for Equity:
growth,value,small_cap,dividend,tech,financial,healthcare
Multi-Factor Preselection¶
Why Preselection?¶
Problem: Large universes (100+ stocks) make optimization slow and unstable.
Solution: Use factor-based preselection to reduce universe size before optimization.
Factor 1: Momentum¶
Strategy: Select stocks with highest past returns (trend-following).
from portfolio_management.portfolio import PreselectionConfig, PreselectionMethod
config = PreselectionConfig(
method=PreselectionMethod.MOMENTUM,
top_k=30, # Select 30 stocks
lookback=252, # Use 1-year return (252 trading days)
skip=21, # Skip last month (20 trading days)
min_periods=126, # Need 6 months minimum data
)
Formula:
Why Skip Last Month?
- Short-term reversals are common (winners become losers)
- "12-1 momentum" (12 months minus 1 month) is empirically stronger
- Avoids trading on noise
Example Scores:
Stock A: +45% return (lookback period) → High score → Selected
Stock B: +5% return → Low score → Not selected
Stock C: -10% return → Negative → Not selected
Factor 2: Low-Volatility¶
Strategy: Select stocks with lowest realized volatility (defensive).
config = PreselectionConfig(
method=PreselectionMethod.LOW_VOL,
top_k=30,
lookback=252, # Use 1-year volatility
min_periods=126,
)
Formula:
Higher score = lower volatility = more attractive (defensive bias).
Example:
Stock A: 15% annual vol → Score = 1/0.15 = 6.67 → High score
Stock B: 40% annual vol → Score = 1/0.40 = 2.50 → Low score
Factor 3: Combined (Multi-Factor)¶
Strategy: Combine momentum and low-vol using weighted Z-scores.
config = PreselectionConfig(
method=PreselectionMethod.COMBINED,
top_k=30,
lookback=252,
skip=21,
momentum_weight=0.6, # 60% momentum
low_vol_weight=0.4, # 40% low-vol
min_periods=126,
)
Process:
- Calculate raw momentum scores for all stocks
- Calculate raw low-vol scores for all stocks
- Standardize each factor to Z-scores (mean=0, std=1)
- Combine:
Combined = 0.6 × Momentum_Z + 0.4 × LowVol_Z - Select top 30 stocks by combined score
Intuition:
- 60% momentum: Emphasize trend-following (winners keep winning)
- 40% low-vol: Add defensive tilt (reduce drawdowns)
- Result: Stocks that are both rising AND stable
Example:
Stock A: Momentum Z-score = +1.5, Low-Vol Z-score = +0.8
Combined = 0.6×1.5 + 0.4×0.8 = 1.22 → Likely selected
Stock B: Momentum Z-score = +2.0, Low-Vol Z-score = -1.0
Combined = 0.6×2.0 + 0.4×(-1.0) = 0.80 → Maybe selected
Stock C: Momentum Z-score = -0.5, Low-Vol Z-score = +1.5
Combined = 0.6×(-0.5) + 0.4×1.5 = 0.30 → Probably not
Deterministic Tie-Breaking¶
If two stocks have identical scores, tiebreaker is alphabetical by ticker.
Why? Ensures reproducible results across runs (critical for testing).
Membership Policy (Turnover Control)¶
Why Membership Policy?¶
Problem: Without controls, portfolios churn too much:
- High transaction costs
- Tax inefficiency (realized capital gains)
- Unstable allocations
Solution: Membership policy enforces holding periods and limits turnover.
Configuration¶
from portfolio_management.portfolio import MembershipPolicy
policy = MembershipPolicy(
buffer_rank=40, # Keep existing if in top 40
min_holding_periods=4, # Hold at least 4 rebalances
max_turnover=0.20, # Max 20% turnover per rebalance
max_new_assets=5, # Add at most 5 new stocks
max_removed_assets=5, # Remove at most 5 stocks
)
How It Works¶
Scenario: Quarterly rebalancing, 30-stock portfolio.
Current holdings: Stocks ranked 1-30 at last rebalance.
New rankings: Preselection ranks all 100 stocks.
Without membership policy:
- Rebalance to top 30 (ranks 1-30)
- Could replace all 30 stocks (100% turnover!)
With membership policy:
- Buffer rank: Keep existing stocks if ranked ≤40 (even if outside top 30)
- Min holding: Can't remove stocks held \<4 quarters
- Max turnover: Limit total weight change to 20%
- Max new/removed: Add/remove at most 5 stocks
Example:
Existing Portfolio (30 stocks):
A (rank 1), B (rank 5), C (rank 35), D (rank 50), ...
New Rankings:
A (rank 1) → Keep (top 30)
B (rank 5) → Keep (top 30)
C (rank 35) → Keep (inside buffer rank 40)
D (rank 50) → Remove (outside buffer, held 4+ periods)
E (rank 45) → Remove (outside buffer, held 4+ periods)
New stocks to add:
X (rank 10) → Add
Y (rank 15) → Add
Z (rank 20) → Add
But wait! max_new_assets=5, so only add top 3 new stocks.
Result: Net turnover ≈ 10% (well below 20% limit)
Benefits¶
✅ Lower transaction costs (fewer trades) ✅ Tax efficiency (defer capital gains) ✅ Stable allocations (less portfolio drift) ✅ Better long-term performance (empirically proven)
Point-in-Time Eligibility¶
The Lookahead Bias Problem¶
Bad Practice:
# At rebalance date 2020-01-01, use data from 2020-01-31
returns = returns.loc["2019-01-01":"2020-01-31"] # WRONG!
This uses future information (data from January 2020 to make a decision on January 1, 2020).
Result: Unrealistic backtest performance (you can't trade on data you don't have yet).
Point-in-Time Solution¶
backtest_config = BacktestConfig(
use_pit_eligibility=True,
min_history_days=252, # Require 1 year before trading
)
What It Does:
- At each rebalance date, check each asset's data availability
- Only include assets with ≥252 days of prior data
- Exclude assets that:
- IPO'd recently (insufficient history)
- Delisted (no longer tradeable)
- Have data gaps (quality issues)
Example:
Rebalance Date: 2020-01-01
Stock A: First data = 2018-01-01 → 2 years history → Eligible ✓
Stock B: First data = 2019-11-01 → 2 months history → Not eligible ✗
Stock C: Delisted 2019-12-15 → Not tradeable → Not eligible ✗
Combined with Preselection¶
# Point-in-time preselection
eligible_assets = pit_filter(returns, rebalance_date) # Only past data
factor_scores = calculate_momentum(returns[eligible_assets]) # No lookahead
selected = factor_scores.nlargest(30) # Top 30 from eligible only
Result: Realistic backtest that could have been executed in real-time.
Statistics Caching¶
The Performance Problem¶
Without caching:
- Monthly rebalancing requires calculating covariance matrix each month
- Overlapping windows: Jan-Dec, Feb-Jan, Mar-Feb, ...
- Recalculating same 252×252 covariance repeatedly is slow
With caching:
- Calculate once, store on disk
- Reuse cached result if inputs haven't changed
- 5-10x speedup for repeated backtests
Configuration¶
from portfolio_management.data.factor_caching import FactorCache
cache = FactorCache(
cache_dir=Path(".cache/my_backtest"),
enabled=True,
ttl_days=None, # No expiration
)
What Gets Cached¶
- Factor scores (momentum, low-vol)
- Covariance matrices (for risk parity, mean-variance)
- Expected returns (for mean-variance)
- PIT eligibility (asset availability lookup)
Cache Invalidation¶
Cache automatically rebuilds when:
- Input data changes (prices, returns)
- Configuration changes (lookback, skip, weights)
- Rebalance date changes
Result: Always correct results, but much faster on repeated runs.
Example Performance¶
First backtest: 5 minutes (cold cache)
Second backtest: 30 seconds (warm cache)
Third backtest: 30 seconds (cached)
Speedup: 10x!
Complete Workflow Example¶
Scenario: S&P 500 Blue Chip Portfolio (2005-2025)¶
Goal: Build a 30-stock portfolio from S&P 500 blue chips using momentum + low-vol factors.
Step 1: Define Universe¶
# config/sp500_blue_chips.yaml
universes:
sp500_blue_chips:
description: "S&P 500 blue chips with 20+ year history"
filter_criteria:
data_status: ["ok"]
min_history_days: 7300 # 20 years
min_price_rows: 5040 # 20 years trading days
markets: ["NYSE", "NSQ"]
currencies: ["USD"]
categories:
- "nyse stocks/1" # Tier 1 large caps
- "nasdaq stocks/1"
constraints:
min_assets: 80
max_assets: 120
Step 2: Load Universe¶
Output: outputs/sp500/sp500_blue_chips_returns.csv (100 stocks × 5000+ days)
Step 3: Configure Preselection¶
preselection = Preselection(
PreselectionConfig(
method=PreselectionMethod.COMBINED,
top_k=30, # 100 → 30 stocks
lookback=252, # 1-year factors
skip=21, # Skip last month
momentum_weight=0.6,
low_vol_weight=0.4,
),
cache=cache,
)
Step 4: Configure Membership Policy¶
membership = MembershipPolicy(
buffer_rank=40,
min_holding_periods=4, # 1 year hold (quarterly rebalance)
max_turnover=0.20,
)
Step 5: Run Backtest¶
engine = BacktestEngine(
config=BacktestConfig(
start_date=date(2005, 1, 1),
end_date=date(2025, 1, 1),
rebalance_frequency=RebalanceFrequency.QUARTERLY,
use_pit_eligibility=True,
),
strategy=RiskParityStrategy(),
preselection=preselection,
membership_policy=membership,
)
result = engine.run(prices, returns)
Step 6: Review Results¶
print(f"Annualized Return: {result.metrics.annualized_return:.2%}")
print(f"Sharpe Ratio: {result.metrics.sharpe_ratio:.3f}")
print(f"Max Drawdown: {result.metrics.max_drawdown:.2%}")
print(f"Total Trades: {len(result.trade_log)}")
Expected Performance (20-year backtest):
- Return: 10-15% annualized
- Sharpe: 0.8-1.2
- Max Drawdown: 30-40% (2008 crisis)
- Trades: 300-500 (quarterly rebalancing)
Advanced Filtering Recipes¶
1. Conservative Dividend Portfolio¶
filter_criteria:
data_status: ["ok"]
min_history_days: 3650 # 10 years
markets: ["NYSE"] # Blue chip exchange only
classification_requirements:
sub_class: ["dividend"]
preselection:
method: "low_vol" # Defensive
top_k: 20
2. Aggressive Growth Tech¶
filter_criteria:
data_status: ["ok", "warning"] # Allow some risk
min_history_days: 1260 # 5 years
markets: ["NSQ"] # NASDAQ = tech-heavy
classification_requirements:
sub_class: ["growth", "tech"]
preselection:
method: "momentum" # Trend-following
top_k: 30
lookback: 126 # 6-month momentum (aggressive)
skip: 5 # Skip 1 week only
3. International Diversification¶
filter_criteria:
markets: ["LSE", "NYSE", "NSQ"] # US + UK
currencies: ["GBP", "USD", "EUR"]
classification_requirements:
geography: ["europe", "north_america", "asia"]
preselection:
method: "combined"
momentum_weight: 0.4 # Less momentum
low_vol_weight: 0.6 # More defensive
Running the Example¶
This demonstrates:
- ✅ Custom universe with quality filters
- ✅ Multi-factor preselection (60% momentum + 40% low-vol)
- ✅ Membership policy (4-quarter hold, 20% max turnover)
- ✅ Point-in-time eligibility (no lookahead)
- ✅ Factor caching (5-10x speedup)
- ✅ Transaction cost modeling
- ✅ Strategy comparison
Next Steps¶
- Experiment with factors: Try different momentum/low-vol weights
- Adjust membership: Test tighter/looser turnover limits
- Test strategies: Compare equal-weight, risk-parity, mean-variance
- Add constraints: Max sector exposure, max single-stock weight
- Long backtests: Test 20+ years including 2000, 2008, 2020 crises
See examples/sp500_blue_chips_advanced.py for complete working code!