Universe Configuration Guide (universes.yaml)¶
Overview¶
The config/universes.yaml file is the central blueprint for the entire portfolio workflow. It allows you to define a named, reusable set of rules and parameters that orchestrate the entire data pipeline, from asset selection through to return calculation.
Instead of running each script manually with many command-line arguments, you can define a universe once in this file. Then, you can use the scripts/manage_universes.py script to validate, load, or compare your defined universes, making your workflow repeatable and easy to manage.
Long-History Reference Universes¶
For large scale regression tests we maintain config/universes_long_history.yaml, which stores precomputed ticker rosters derived from the Stooq archive. The long_history_1000 entry now covers 1,000 assets with continuous daily prices from 2005-02-25 onward, excluding the previously gap-prone tickers (BT.A, SANM, SANM:US, AXGN, AXGN:US). The corresponding price and return matrices live under outputs/long_history_1000/ (with returns published as the compressed long_history_1000_returns_daily.csv.gz) and align with the metadata exported by the asset selection pipeline.
Role in the Workflow¶
This configuration file is the heart of the Managed Workflow. The intended end-to-end process is as follows:
-
Phase 1: Initial Data Preparation (Run Once) You run the
scripts/prepare_tradeable_data.pyscript once to scan all your raw data and generate the mastertradeable_matches.csvfile. -
Phase 2: Universe Definition (Manual Strategy Step) You edit this
universes.yamlfile to define the rules and parameters for your investment strategies (universes). -
Phase 3: Pipeline Execution (Managed by
manage_universes.py) You use thescripts/manage_universes.pyscript with a command likeload <your_universe_name>to automatically execute the full data pipeline (select, classify, calculate returns) based on the rules you defined here.
Workflow Overview¶
graph TD
A[`config/universes.yaml`]
B[`scripts/manage_universes.py`]
C[validate-file / compare-files / load]
D[Asset selection + classification + return calculation]
E[Outputs: selected assets, classified assets, returns, logs]
A --> B
B --> C
C --> D
D --> E
The diagram shows how the universe configuration feeds the CLI, which validates, compares, or loads universes and then delegates to the selection/classification/returns pipeline before producing the curated outputs.
Basic Structure¶
The file has a top-level universes: key. Underneath it, you can define one or more named universes. Each universe is a collection of settings that control the different stages of the pipeline.
universes:
my_first_universe:
# ... configuration for this universe ...
my_second_universe:
# ... configuration for this universe ...
Universe Definition¶
Each named universe has seven configuration blocks (four core blocks plus three optional advanced blocks):
Core Configuration Blocks¶
1. filter_criteria¶
This block defines the rules for the asset selection step. The keys in this block correspond directly to the command-line arguments of the scripts/select_assets.py script.
Example:
filter_criteria:
data_status: ["ok"]
min_history_days: 756
markets: ["LSE", "NYSE"]
currencies: ["GBP", "USD"]
# ... and any other arguments from select_assets.py
For a full list and detailed explanation of these criteria, see the docs/asset_selection.md documentation.
2. classification_requirements¶
This block provides a way to filter the asset list after it has been selected and classified. This allows you to create universes with a specific thematic focus.
Example:
classification_requirements:
# Only keep assets that were classified as equity
asset_class: ["equity"]
# And only from these specific geographies
geography: ["north_america", "united_kingdom"]
3. return_config¶
This block defines the parameters for the return calculation step. The keys in this block correspond directly to the command-line arguments of the scripts/calculate_returns.py script.
Example:
return_config:
method: "simple"
frequency: "monthly"
handle_missing: "forward_fill"
align_method: "outer"
min_coverage: 0.85
# ... and any other arguments from calculate_returns.py
For a full list and detailed explanation of these parameters, see the docs/calculate_returns.md documentation.
4. constraints¶
This block sets high-level constraints on the final size of the universe itself, ensuring that the list of assets passing all the above filters is within an expected range.
Example:
This ensures that after all filtering and classification, the final asset list used for portfolio construction will have between 30 and 50 assets.
Advanced Configuration Blocks (Optional)¶
These optional blocks enable advanced portfolio features introduced in Sprint 2:
5. preselection (Optional)¶
Controls factor-based universe reduction before portfolio optimization. Applies momentum, low-volatility, or combined factor ranking to select top-K assets.
Parameters:
method: Factor method (momentum,low_vol, orcombined)top_k: Number of assets to selectlookback: Lookback period in days (default: 252)skip: Skip N most recent days (default: 1)momentum_weight: Weight for momentum in combined method (default: 0.5)low_vol_weight: Weight for low-volatility in combined method (default: 0.5)min_periods: Minimum data periods required (default: 60)
Example:
preselection:
method: "combined" # momentum, low_vol, or combined
top_k: 30 # Select top 30 assets
lookback: 252 # 1 year lookback
skip: 1 # Skip most recent day
momentum_weight: 0.6 # 60% momentum
low_vol_weight: 0.4 # 40% low-volatility
min_periods: 60 # Require at least 60 days
See docs/preselection.md for detailed documentation.
6. membership_policy (Optional)¶
Controls portfolio turnover and asset churn during rebalancing. Prevents excessive changes to portfolio composition.
Parameters:
buffer_rank: Rank buffer to protect existing holdings (default: 5)min_holding_periods: Minimum rebalance periods to hold an asset (default: 3)max_turnover: Maximum portfolio turnover per rebalance (0-1)max_new_assets: Maximum new assets to add per rebalancingmax_removed_assets: Maximum assets to remove per rebalancing
Example:
membership_policy:
buffer_rank: 10 # Protect holdings within top 40 if top_k=30
min_holding_periods: 3 # Hold for at least 3 rebalances
max_turnover: 0.30 # Maximum 30% turnover
max_new_assets: 5 # Add maximum 5 new assets
max_removed_assets: 3 # Remove maximum 3 assets
See docs/membership_policy.md for detailed documentation.
7. pit_eligibility (Optional)¶
Controls point-in-time eligibility filtering to prevent lookahead bias. Ensures assets have sufficient history at each rebalance date.
Parameters:
enabled: Enable PIT eligibility filtering (boolean)min_history_days: Minimum days of history required (default: 252)min_price_rows: Minimum price observations required (default: 252)
Example:
pit_eligibility:
enabled: true # Enable PIT filtering
min_history_days: 252 # Require 1 year of history
min_price_rows: 252 # Require 252 price observations
See docs/pit_eligibility_edge_cases.md for edge case documentation.
Complete Examples¶
Example 1: Basic Universe (Core Blocks Only)¶
Here is a basic universe definition using only the core configuration blocks. This is based on the core_global universe from the project's default configuration.
universes:
core_global:
# A human-readable description of the universe's goal
description: "Global core sleeve of diversified ETFs"
# --- Asset Selection (filter_criteria) ---
filter_criteria:
# Only include assets with clean data quality
data_status: ["ok"]
# Assets must have at least 756 calendar days (~3 years) of history
min_history_days: 756
min_price_rows: 756
# Restrict to assets trading on the London Stock Exchange
markets: ["LSE", "GBR-LSE"]
# Restrict to assets traded in Great British Pounds
currencies: ["GBP"]
# Custom filter specific to the project's data
categories:
- "lse etfs/1"
- "lse etfs/2"
# --- Post-Classification Filter (classification_requirements) ---
classification_requirements:
# Only keep assets that fall into these classes
asset_class: ["equity", "commodity", "real_estate"]
geography: ["united_kingdom", "north_america"]
# --- Return Calculation (return_config) ---
return_config:
method: "simple"
frequency: "monthly"
handle_missing: "forward_fill"
max_forward_fill_days: 5
min_periods: 5
align_method: "outer"
min_coverage: 0.85
# --- Universe Size Constraints (constraints) ---
constraints:
min_assets: 30
max_assets: 50
Example 2: Advanced Universe (All Features)¶
This example shows a universe definition with all optional advanced features enabled, suitable for production use with large universes and monthly rebalancing.
universes:
production_momentum_tilted:
description: "Production momentum-tilted portfolio with turnover control"
# --- Asset Selection ---
filter_criteria:
data_status: ["ok"]
min_history_days: 1260 # 5 years for robust statistics
min_price_rows: 1000
markets: ["NYSE", "NASDAQ", "LSE"]
currencies: ["USD", "GBP"]
# --- Post-Classification Filter ---
classification_requirements:
asset_class: ["equity"]
geography: ["united_states", "united_kingdom"]
# --- Return Calculation ---
return_config:
method: "simple"
frequency: "daily"
handle_missing: "forward_fill"
max_forward_fill_days: 3
min_periods: 10
align_method: "inner"
min_coverage: 0.90
# --- Universe Size ---
constraints:
min_assets: 40
max_assets: 60
# --- Preselection: Select top 50 by combined momentum/low-vol ---
preselection:
method: "combined"
top_k: 50
lookback: 252
skip: 21 # Skip most recent month
momentum_weight: 0.7 # Heavy momentum tilt
low_vol_weight: 0.3 # Light defensive tilt
min_periods: 180 # 180 days minimum
# --- Membership Policy: Strict turnover control ---
membership_policy:
buffer_rank: 15 # Large buffer for stability
min_holding_periods: 5 # Hold for at least 5 rebalances
max_turnover: 0.25 # Maximum 25% turnover
max_new_assets: 3 # Add maximum 3 new assets
max_removed_assets: 2 # Remove maximum 2 assets
# --- PIT Eligibility: Prevent lookahead bias ---
pit_eligibility:
enabled: true
min_history_days: 504 # Require 2 years
min_price_rows: 400 # Require 400 observations
Example 3: Defensive Universe (Low-Vol Focus)¶
A universe designed for defensive, low-volatility investing with monthly rebalancing.
universes:
defensive_low_vol:
description: "Defensive low-volatility portfolio"
filter_criteria:
data_status: ["ok"]
min_history_days: 756
markets: ["NYSE", "NASDAQ"]
currencies: ["USD"]
classification_requirements:
asset_class: ["equity", "fixed_income"]
return_config:
method: "simple"
frequency: "daily"
handle_missing: "forward_fill"
min_coverage: 0.85
constraints:
min_assets: 25
max_assets: 40
# Preselection: Pure low-volatility
preselection:
method: "low_vol"
top_k: 30
lookback: 252
min_periods: 120
# Membership: Very stable (defensive strategy)
membership_policy:
buffer_rank: 20
min_holding_periods: 6
max_turnover: 0.15 # Very low turnover
pit_eligibility:
enabled: true
min_history_days: 252
Example 4: Large Universe with Aggressive Preselection¶
For backtesting very large universes (500+ assets), use aggressive preselection.
universes:
large_universe_momentum:
description: "Large universe with aggressive momentum preselection"
filter_criteria:
data_status: ["ok"]
min_history_days: 252
# No market/currency filters - global universe
return_config:
method: "simple"
frequency: "daily"
constraints:
min_assets: 100 # Start with 100+ assets
max_assets: 1000 # Can handle up to 1000
# Preselection: Reduce to manageable size
preselection:
method: "momentum"
top_k: 50 # Aggressive reduction: 1000 → 50
lookback: 252
skip: 1
# Membership: Moderate control
membership_policy:
buffer_rank: 10
min_holding_periods: 3
max_turnover: 0.30
pit_eligibility:
enabled: true
min_history_days: 252
Validation Process¶
The toolkit provides validation tools to check universe configurations before running backtests.
Command-Line Validation¶
Use manage_universes.py to validate a universe definition:
# Validate a single universe
python scripts/manage_universes.py validate core_global
# Validate all universes in a file
python scripts/manage_universes.py validate-all
Validation Checks:
- YAML Syntax: Verifies the file is valid YAML
- Required Blocks: Ensures
filter_criteria,return_config, andconstraintsare present - Parameter Types: Validates that parameters have correct types (int, float, string, list)
- Parameter Ranges: Checks that numeric values are within valid ranges
- Block Dependencies: Verifies optional blocks have required parameters
- Constraint Consistency: Checks that
min_assets≤max_assets
Example Output:
✓ Universe 'core_global' is valid
✓ All required blocks present
✓ Parameter types correct
✓ Constraints consistent
✓ Preselection configuration valid
✓ Membership policy configuration valid
✓ PIT eligibility configuration valid
Validation Warnings¶
The validation process may emit warnings for non-fatal issues:
Common Warnings:
- Large Top-K:
preselection.top_k > 100- May not provide sufficient universe reduction - Small Min-Hold:
membership_policy.min_holding_periods < 3- May lead to excessive turnover - High Turnover:
membership_policy.max_turnover > 0.5- May incur high transaction costs - Short History:
pit_eligibility.min_history_days < 252- May include immature assets
Handling Warnings:
# Run with strict validation (warnings become errors)
python scripts/run_backtest.py risk_parity --strict
# Suppress warnings (not recommended)
python scripts/run_backtest.py risk_parity --ignore-warnings
Programmatic Validation¶
You can validate configurations programmatically:
from portfolio_management.config.validation import (
validate_preselection_config,
validate_membership_config,
validate_pit_config,
)
# Validate preselection
warnings = validate_preselection_config(preselection_dict)
for warning in warnings:
print(f"Warning: {warning.message}")
# Validate membership policy
warnings = validate_membership_config(membership_dict)
# Validate PIT eligibility
warnings = validate_pit_config(pit_dict)
Workflow: Export and Compare¶
Load and Run a Universe¶
# Load universe and run full pipeline
python scripts/manage_universes.py load core_global
# This automatically executes:
# 1. select_assets.py (using filter_criteria)
# 2. classify_assets.py
# 3. calculate_returns.py (using return_config)
Compare Multiple Universes¶
Compare asset selection across different universe definitions:
# Compare two universes
python scripts/manage_universes.py compare core_global defensive_low_vol
# Output shows:
# - Assets unique to each universe
# - Assets in common
# - Differences in universe size
Example Output:
Universe Comparison: core_global vs defensive_low_vol
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Core Stats:
core_global: 42 assets
defensive_low_vol: 28 assets
In common: 18 assets
Unique to core_global (24 assets):
AAPL, TSLA, GOOGL, ...
Unique to defensive_low_vol (10 assets):
TLT, IEF, BND, ...
Overlap: 42.9%
Export Universe¶
Export a processed universe to CSV for use in other tools:
# Export asset list
python scripts/manage_universes.py export core_global \
--output universes/core_global_assets.csv
# Export with returns
python scripts/manage_universes.py export core_global \
--include-returns \
--output universes/core_global_full.csv
Best Practices¶
Configuration Organization¶
File Structure:
config/
├── universes.yaml # Main production universes
├── universes_test.yaml # Test/development universes
├── universes_long_history.yaml # Historical regression test universes
└── universes_experimental.yaml # Experimental configurations
Naming Conventions:
core_*: Core portfolio sleeves (main allocations)satellite_*: Satellite sleeves (tactical tilts)defensive_*: Defensive/low-volatility strategiesmomentum_*: Momentum-tilted strategiestest_*: Testing configurations
Parameter Selection¶
Universe Size:
- Small portfolios (20-30 assets): Better for concentrated strategies
- Medium portfolios (30-50 assets): Good balance of diversification and simplicity
- Large portfolios (50-100+ assets): Maximum diversification but higher complexity
Preselection Top-K:
- Conservative:
top_k = 0.5 × starting_universe(select half) - Moderate:
top_k = 0.2 × starting_universe(select 20%) - Aggressive:
top_k = 30-50(fixed target regardless of starting size)
Membership Policy:
- Stable portfolios: High
buffer_rank(10-20), highmin_holding_periods(5-8) - Balanced portfolios: Medium
buffer_rank(5-10), mediummin_holding_periods(3-5) - Active portfolios: Low
buffer_rank(2-5), lowmin_holding_periods(1-2)
PIT Eligibility:
- Always enable for production backtests (prevents lookahead bias)
min_history_days = rebalance_lookback: Ensure enough data for first rebalance- Typical:
min_history_days = 252(1 year) or504(2 years)
Version Control¶
Track Universe Changes:
# Tag universe configurations with versions
git tag -a universes-v1.0 -m "Initial production universe configurations"
git push origin universes-v1.0
# Document changes in commit messages
git commit -m "Update core_global: Increase min_history to 1260 days (5 years)"
Review Process:
- Create new universe in experimental file
- Validate configuration
- Run backtest on historical data
- Compare with existing universes
- Move to production file after approval
Troubleshooting¶
Issue: Universe Too Small After Filtering¶
Symptom: Universe has fewer assets than min_assets
Solutions:
- Relax
filter_criteria(reducemin_history_days, add more markets) - Reduce
min_assetsconstraint - Remove or relax
classification_requirements - Check data availability (may need more data sources)
Issue: Preselection Reduces Universe Too Much¶
Symptom: Preselection selects fewer than expected assets
Causes:
- Many assets have insufficient data for factor calculation
min_periodsis too strictlookbackperiod exceeds available history
Solutions:
- Reduce
preselection.min_periods - Reduce
preselection.lookback - Enable PIT eligibility to filter before preselection
Issue: Membership Policy Too Restrictive¶
Symptom: Portfolio never rebalances or adds new assets
Solutions:
- Reduce
buffer_rank(less protective) - Reduce
min_holding_periods(allow earlier exits) - Increase
max_turnover(allow more changes) - Increase
max_new_assetsandmax_removed_assets
Issue: Validation Warnings¶
Common Fixes:
| Warning | Fix |
|---|---|
| Large top-K | Reduce preselection.top_k to 30-50 |
| Small min-hold | Increase membership_policy.min_holding_periods to 3+ |
| High turnover | Reduce membership_policy.max_turnover to ≤0.30 |
| Short history | Increase pit_eligibility.min_history_days to 252+ |
See Also¶
- Asset Selection - Filter criteria details
- Preselection - Factor-based universe reduction
- Membership Policy - Turnover control
- Backtesting - Running backtests with universes
- Portfolio Construction - Strategy selection