Portfolio Construction Script: construct_portfolio.py¶
Overview¶
This script is the fifth step in the portfolio management toolkit's data pipeline. This is where financial theory is applied to the historical returns data to create a portfolio.
Its purpose is to take the matrix of asset returns and, based on a chosen investment strategy and a set of user-defined constraints, calculate the optimal weight (or allocation percentage) for each asset.
Like the other scripts, this file acts as a command-line orchestrator for the core logic encapsulated in the PortfolioConstructor class.
Inputs (Prerequisites)¶
This script requires one primary input and one optional input from previous steps:
-
Returns Matrix CSV (Required)
-
Generated by:
scripts/calculate_returns.py - Specified via:
--returns -
Purpose: This file provides the historical return data for all assets, which is the foundation for all quantitative strategies.
-
Classifications CSV (Optional)
-
Generated by:
scripts/classify_assets.py - Specified via:
--classifications - Purpose: This file is only needed if you intend to use constraints based on asset classes, such as
--max-equityor--min-bond.
Script Products¶
The script's output depends on its mode of operation:
-
Portfolio Weights CSV (Primary Product)
-
Description: When running with a single strategy, the script produces a CSV file with two columns: one for the asset tickers and one for their calculated
weightin the portfolio. This file is the direct input for therun_backtest.pyscript. -
Strategy Comparison CSV
-
Description: When running with the
--compareflag, the script produces a CSV file where the rows are the asset tickers and each column represents a different investment strategy. The values in the table are the weights assigned to each asset by each strategy. This is useful for analysis and comparison.
Features in Detail¶
Portfolio Construction Strategies¶
The toolkit implements three well-established portfolio construction strategies, each with different optimization objectives:
1. Equal Weight (equal_weight)¶
The simplest baseline strategy. Assigns equal weight (1/N) to every asset in the universe.
Characteristics:
- Optimization objective: None (deterministic 1/N allocation)
- Computational cost: O(n) - fastest
- Best for: Small universes, benchmark comparison, when you believe in equal risk contribution without correlation assumptions
Advantages:
- No estimation error (no parameters to estimate)
- Robust to outliers and data quality issues
- No risk of optimization failure
- Computationally trivial
Disadvantages:
- Ignores correlations and risk differences between assets
- Can lead to concentrated risk if assets are highly correlated
- No consideration of expected returns or volatility
2. Risk Parity (risk_parity)¶
Allocates capital so that each asset contributes equally to overall portfolio risk (volatility).
Characteristics:
- Optimization objective: Equal risk contribution from each asset
- Computational cost: O(n²) - moderate (covariance computation)
- Best for: Medium to large universes (30-300 assets), when you want balanced risk exposure
Advantages:
- Diversifies risk exposure evenly across assets
- Computationally efficient (faster than mean-variance)
- More stable than mean-variance (uses only volatility, not expected returns)
- Handles correlations explicitly
Disadvantages:
- Can overweight low-volatility (often defensive) assets
- Ignores expected returns entirely
- May concentrate in bonds/cash during high market volatility
Large-Universe Behavior:
- Automatically falls back to inverse-volatility when >300 assets
- Applies covariance matrix stabilization (diagonal jitter) when near-singular
- All portfolio constraints still enforced in fallback mode
3. Mean-Variance (mean_variance_max_sharpe, mean_variance_min_volatility)¶
Finds the portfolio with optimal risk-adjusted return using Markowitz's mean-variance optimization.
Two variants:
mean_variance_max_sharpe: Maximizes Sharpe ratio (risk-adjusted return)mean_variance_min_volatility: Minimizes portfolio volatility
Characteristics:
- Optimization objective: Max Sharpe or min volatility
- Computational cost: O(n³) - slowest (quadratic programming)
- Best for: Small to medium universes (\<100 assets), when you have confidence in return estimates
Advantages:
- Theoretically optimal (maximizes expected utility)
- Explicitly considers expected returns
- Can target specific risk/return profiles
- Backed by decades of academic research
Disadvantages:
- Highly sensitive to estimation errors in expected returns
- Computationally expensive for large universes
- May produce extreme/concentrated weights
- Can be unstable across rebalancing periods
Large-Universe Behavior:
- Applies covariance shrinkage for numerical stability
- Falls back to closed-form tangency solution if optimizer fails
- Sanitizes return inputs before optimization
- All constraints validated after optimization
Large-Universe Hardening¶
Running the optimiser across hundreds of tickers can surface numerical edge cases. The strategies now include guard rails so 1,000-name universes stay tractable without hand-tuning:
risk_parityautomatically falls back to an inverse-volatility solution when more than 300 assets are present, while still enforcing all portfolio constraints, and it stabilises the covariance matrix with a light diagonal “jitter” whenever the inputs are nearly singular.mean_variance_*sanitises return inputs, applies shrinkage to the covariance matrix, and will switch to a closed-form tangency approximation when the universe is extremely large or PyPortfolioOpt struggles to converge. The resulting weights are normalised and validated against the same constraint set as the main optimiser.
These safeguards match the configuration used for the long_history_1000 runs and remove the need for ad-hoc ticker pruning when scaling up experiments.
Portfolio Constraints¶
Constraints are rules that the final portfolio must obey. They allow you to enforce your own views or risk limits on top of the chosen strategy.
--max-weight/--min-weight: Sets the maximum and minimum allowable weight for any single asset.--max-equity: Sets the maximum total allocation to all assets classified as "equity".--min-bond: Sets the minimum total allocation to all assets classified as "fixed_income" or "cash".
Strategy Comparison Mode¶
Instead of building a single portfolio, you can use the --compare flag. This tells the script to run all available strategies and generate a single table comparing the weights produced by each, making it easy to see how different theories allocate capital to the same set of assets.
Workflow Overview¶
flowchart LR
A["Returns matrix CSV\n`--returns`"]
B["Classifications CSV\n`--classifications` (optional)"]
C["PortfolioConstructor engine\nstrategy + constraints"]
D{"Strategy choice"}
E["Equal Weight strategy"]
F["Risk Parity strategy"]
G["Mean-Variance variants"]
H["Constraints enforcement\nweights filtered by `--max-equity`, etc."]
I["Outputs\nweights CSV / comparison table"]
A --> C
B --> C
C --> D
D --> E
D --> F
D --> G
E --> H
F --> H
G --> H
H --> I
The diagram highlights that returns (and optional classifications) feed the constructor, which branches by strategy, applies constraint checks, and emits the final weights/comparison artifacts.
Usage Examples¶
# Construct a simple equal-weight portfolio
python scripts/construct_portfolio.py \
--returns data/processed/returns.csv \
--strategy equal_weight \
--output outputs/portfolio_equal_weight.csv
# Construct a Mean-Variance portfolio with exposure constraints
python scripts/construct_portfolio.py \
--returns data/processed/returns.csv \
--classifications data/processed/classified_assets.csv \
--strategy mean_variance_max_sharpe \
--max-equity 0.80 \
--min-bond 0.15 \
--output outputs/portfolio_mv.csv
# Compare all available strategies
python scripts/construct_portfolio.py \
--returns data/processed/returns.csv \
--compare \
--output outputs/portfolio_comparison.csv
Strategy Selection Guidance¶
Decision Framework¶
Choose your strategy based on three key factors:
- Universe Size: How many assets are you investing in?
- Data Quality: How reliable are your return estimates?
- Computational Budget: How much time/resources do you have?
| Universe Size | Data Quality | Recommended Strategy | Rationale |
|---|---|---|---|
| Small (\<30) | Any | Mean-Variance Max Sharpe | Optimization tractable, can fully exploit return estimates |
| Medium (30-100) | High | Mean-Variance or Risk Parity | Both feasible, choose based on return confidence |
| Medium (30-100) | Low | Risk Parity | More robust to estimation error |
| Large (100-300) | Any | Risk Parity | Mean-variance too slow/unstable |
| Very Large (300+) | Any | Equal Weight + Preselection | Use preselection to reduce universe first |
When to Use Each Strategy¶
Equal Weight:
- ✓ Benchmark comparison
- ✓ Unknown/unreliable correlations
- ✓ Very large universes (after preselection)
- ✓ Regulatory/policy constraints on optimization
- ✗ When you have good risk/return estimates
- ✗ When assets have very different volatilities
Risk Parity:
- ✓ Want balanced risk exposure
- ✓ Medium-large universes (30-300 assets)
- ✓ Distrust return forecasts
- ✓ Assets with different volatilities
- ✗ When you have high-confidence return views
- ✗ Very large universes (>300, unless with preselection)
Mean-Variance:
- ✓ Have reliable return estimates
- ✓ Small-medium universes (\<100 assets)
- ✓ Want theoretically optimal allocation
- ✓ Can tolerate concentration
- ✗ Poor quality data
- ✗ Large universes (computational cost)
- ✗ Need stable weights across rebalancing
Combining with Preselection¶
For large universes, combine preselection with strategy:
# Large universe → preselect → optimize
python scripts/run_backtest.py risk_parity \
--preselect-method combined \
--preselect-top-k 50 \
# Now only 50 assets go into risk parity optimization
Recommended Combinations:
- Equal Weight + Momentum Preselection: Simple factor tilt
- Risk Parity + Combined Preselection: Balanced risk on selected universe
- Mean-Variance + Low-Vol Preselection: Defensive optimization
See docs/preselection.md for preselection details.
Optimization Troubleshooting¶
Common Issues and Solutions¶
Issue 1: Optimizer Fails to Converge¶
Symptoms:
- Error: "Optimization failed" or "No solution found"
- Mean-variance returns no weights
Causes:
- Ill-conditioned covariance matrix (near-singular)
- Infeasible constraints (e.g., min-bond + max-equity impossible to satisfy)
- Insufficient data (not enough history)
Solutions:
- Reduce universe size: Use preselection to filter to top-K assets
- Relax constraints: Check that constraints are feasible
# Make sure min + max constraints are compatible
--max-equity 0.90 --min-bond 0.10 # Sum = 1.0, feasible
-
Switch strategy: Try risk parity (more robust) or equal weight (always works)
-
Check data quality: Verify returns have sufficient history
Issue 2: Extreme/Concentrated Weights¶
Symptoms:
- One or few assets get >50% allocation
- Many assets get zero or near-zero weight
Causes:
- Mean-variance amplifying estimation errors
- High correlation between assets
- Extreme return forecasts
Solutions:
- Add position limits:
- Use risk parity instead (naturally more diversified):
- Apply preselection to reduce correlation:
- Switch to equal weight if optimization unreliable
Issue 3: Unstable Weights Across Rebalancing¶
Symptoms:
- Weights change dramatically between rebalances
- High turnover
- Same strategy produces very different allocations
Causes:
- Mean-variance sensitivity to small data changes
- Short lookback window
- Noisy return estimates
Solutions:
- Use membership policy to stabilize holdings:
- Increase lookback window:
-
Switch to risk parity (more stable than mean-variance)
-
Apply preselection (filters out marginal assets)
Issue 4: Out of Memory¶
Symptoms:
- Python crashes with MemoryError
- System runs out of RAM during optimization
Causes:
- Large universe (>500 assets)
- Mean-variance O(n³) memory usage
- Long lookback window
Solutions:
- Use preselection to reduce universe:
-
Switch to risk parity or equal weight (lower memory):
-
Risk parity: O(n²) instead of O(n³)
-
Equal weight: O(n)
-
Reduce lookback window:
- Run on machine with more RAM or use smaller universe
Issue 5: Negative Weights (Short Positions)¶
Symptoms:
- Some assets have negative weights
- Portfolio seems to be shorting assets
Causes:
- Constraints allow shorting (not default behavior)
- Bug in constraint enforcement
Solutions:
- Set minimum weight to zero (no shorting):
-
Verify constraint enforcement in output
-
Report issue if negative weights appear despite min-weight=0
Performance Tips¶
For Small Universes (\<50 assets):
- All strategies work well
- Mean-variance is tractable
- Focus on data quality over computational efficiency
For Medium Universes (50-200 assets):
- Prefer risk parity over mean-variance
- Consider preselection for mean-variance
- Enable caching for faster rebalancing
For Large Universes (200+ assets):
- Always use preselection first
- Prefer equal weight or risk parity
- Mean-variance not recommended unless preselected to \<100 assets
- Enable caching and membership policy
General Best Practices:
- Start with equal weight (baseline)
- Test risk parity (usually best balance)
- Try mean-variance only if \<100 assets and good data
- Always run strategy comparison (
--compare) first - Use membership policy in production to control turnover
CLI Reference¶
All CLI parameters for scripts/construct_portfolio.py live in the CLI Reference; this guide focuses on strategy selection, constraints, and usage patterns.
Cardinality Constraints¶
Cardinality constraints limit the number of positions in the portfolio. The toolkit supports two approaches:
-
Preselection (Current): Use factor-based preselection to filter assets before optimization
-
See
docs/preselection.mdfor details - Fast, deterministic, factor-driven
-
No special solver requirements
-
Optimizer-Integrated (Future): Enforce cardinality within the optimization
-
See
docs/cardinality_constraints.mdfor design details - Methods: MIQP, heuristics, relaxation
- Currently design stubs only
The preselection approach is production-ready and recommended for most use cases. The optimizer-integrated approach provides extension points for future development when commercial solvers or heuristic algorithms are needed.
For more information:
- Current implementation:
docs/preselection.md - Future design:
docs/cardinality_constraints.md
See Also¶
- Backtesting - Running backtests with strategies
- Preselection - Factor-based universe reduction
- Membership Policy - Turnover control during rebalancing
- Universes - Universe configuration with YAML
- Cardinality Constraints - Future optimizer-integrated cardinality