Portfolio Construction Script: `construct_portfolio.py`¶

Overview¶

This script is the fifth step in the portfolio management toolkit's data pipeline. This is where financial theory is applied to the historical returns data to create a portfolio.

Its purpose is to take the matrix of asset returns and, based on a chosen investment strategy and a set of user-defined constraints, calculate the optimal weight (or allocation percentage) for each asset.

Like the other scripts, this file acts as a command-line orchestrator for the core logic encapsulated in the PortfolioConstructor class.

Inputs (Prerequisites)¶

This script requires one primary input and one optional input from previous steps:

Returns Matrix CSV (Required)
Generated by: scripts/calculate_returns.py
Specified via: --returns
Purpose: This file provides the historical return data for all assets, which is the foundation for all quantitative strategies.
Classifications CSV (Optional)
Generated by: scripts/classify_assets.py
Specified via: --classifications
Purpose: This file is only needed if you intend to use constraints based on asset classes, such as --max-equity or --min-bond.

Script Products¶

The script's output depends on its mode of operation:

Portfolio Weights CSV (Primary Product)
Description: When running with a single strategy, the script produces a CSV file with two columns: one for the asset tickers and one for their calculated weight in the portfolio. This file is the direct input for the run_backtest.py script.
Strategy Comparison CSV
Description: When running with the --compare flag, the script produces a CSV file where the rows are the asset tickers and each column represents a different investment strategy. The values in the table are the weights assigned to each asset by each strategy. This is useful for analysis and comparison.

Features in Detail¶

Portfolio Construction Strategies¶

The toolkit implements three well-established portfolio construction strategies, each with different optimization objectives:

1. Equal Weight (`equal_weight`)¶

The simplest baseline strategy. Assigns equal weight (1/N) to every asset in the universe.

Characteristics:

Optimization objective: None (deterministic 1/N allocation)
Computational cost: O(n) - fastest
Best for: Small universes, benchmark comparison, when you believe in equal risk contribution without correlation assumptions

Advantages:

No estimation error (no parameters to estimate)
Robust to outliers and data quality issues
No risk of optimization failure
Computationally trivial

Disadvantages:

Ignores correlations and risk differences between assets
Can lead to concentrated risk if assets are highly correlated
No consideration of expected returns or volatility

2. Risk Parity (`risk_parity`)¶

Allocates capital so that each asset contributes equally to overall portfolio risk (volatility).

Characteristics:

Optimization objective: Equal risk contribution from each asset
Computational cost: O(n²) - moderate (covariance computation)
Best for: Medium to large universes (30-300 assets), when you want balanced risk exposure

Advantages:

Diversifies risk exposure evenly across assets
Computationally efficient (faster than mean-variance)
More stable than mean-variance (uses only volatility, not expected returns)
Handles correlations explicitly

Disadvantages:

Can overweight low-volatility (often defensive) assets
Ignores expected returns entirely
May concentrate in bonds/cash during high market volatility

Large-Universe Behavior:

Automatically falls back to inverse-volatility when >300 assets
Applies covariance matrix stabilization (diagonal jitter) when near-singular
All portfolio constraints still enforced in fallback mode

3. Mean-Variance (`mean_variance_max_sharpe`, `mean_variance_min_volatility`)¶

Finds the portfolio with optimal risk-adjusted return using Markowitz's mean-variance optimization.

Two variants:

mean_variance_max_sharpe: Maximizes Sharpe ratio (risk-adjusted return)
mean_variance_min_volatility: Minimizes portfolio volatility

Characteristics:

Optimization objective: Max Sharpe or min volatility
Computational cost: O(n³) - slowest (quadratic programming)
Best for: Small to medium universes (\<100 assets), when you have confidence in return estimates

Advantages:

Theoretically optimal (maximizes expected utility)
Explicitly considers expected returns
Can target specific risk/return profiles
Backed by decades of academic research

Disadvantages:

Highly sensitive to estimation errors in expected returns
Computationally expensive for large universes
May produce extreme/concentrated weights
Can be unstable across rebalancing periods

Large-Universe Behavior:

Applies covariance shrinkage for numerical stability
Falls back to closed-form tangency solution if optimizer fails
Sanitizes return inputs before optimization
All constraints validated after optimization

Large-Universe Hardening¶

Running the optimiser across hundreds of tickers can surface numerical edge cases. The strategies now include guard rails so 1,000-name universes stay tractable without hand-tuning:

risk_parity automatically falls back to an inverse-volatility solution when more than 300 assets are present, while still enforcing all portfolio constraints, and it stabilises the covariance matrix with a light diagonal “jitter” whenever the inputs are nearly singular.
mean_variance_* sanitises return inputs, applies shrinkage to the covariance matrix, and will switch to a closed-form tangency approximation when the universe is extremely large or PyPortfolioOpt struggles to converge. The resulting weights are normalised and validated against the same constraint set as the main optimiser.

These safeguards match the configuration used for the long_history_1000 runs and remove the need for ad-hoc ticker pruning when scaling up experiments.

Portfolio Constraints¶

Constraints are rules that the final portfolio must obey. They allow you to enforce your own views or risk limits on top of the chosen strategy.

--max-weight / --min-weight: Sets the maximum and minimum allowable weight for any single asset.
--max-equity: Sets the maximum total allocation to all assets classified as "equity".
--min-bond: Sets the minimum total allocation to all assets classified as "fixed_income" or "cash".

Strategy Comparison Mode¶

Instead of building a single portfolio, you can use the --compare flag. This tells the script to run all available strategies and generate a single table comparing the weights produced by each, making it easy to see how different theories allocate capital to the same set of assets.

Workflow Overview¶

flowchart LR
    A["Returns matrix CSV\n`--returns`"]
    B["Classifications CSV\n`--classifications` (optional)"]
    C["PortfolioConstructor engine\nstrategy + constraints"]
    D{"Strategy choice"}
    E["Equal Weight strategy"]
    F["Risk Parity strategy"]
    G["Mean-Variance variants"]
    H["Constraints enforcement\nweights filtered by `--max-equity`, etc."]
    I["Outputs\nweights CSV / comparison table"]

    A --> C
    B --> C
    C --> D
    D --> E
    D --> F
    D --> G
    E --> H
    F --> H
    G --> H
    H --> I

The diagram highlights that returns (and optional classifications) feed the constructor, which branches by strategy, applies constraint checks, and emits the final weights/comparison artifacts.

Usage Examples¶

# Construct a simple equal-weight portfolio
python scripts/construct_portfolio.py \
  --returns data/processed/returns.csv \
  --strategy equal_weight \
  --output outputs/portfolio_equal_weight.csv

# Construct a Mean-Variance portfolio with exposure constraints
python scripts/construct_portfolio.py \
  --returns data/processed/returns.csv \
  --classifications data/processed/classified_assets.csv \
  --strategy mean_variance_max_sharpe \
  --max-equity 0.80 \
  --min-bond 0.15 \
  --output outputs/portfolio_mv.csv

# Compare all available strategies
python scripts/construct_portfolio.py \
  --returns data/processed/returns.csv \
  --compare \
  --output outputs/portfolio_comparison.csv

Strategy Selection Guidance¶

Decision Framework¶

Choose your strategy based on three key factors:

Universe Size: How many assets are you investing in?
Data Quality: How reliable are your return estimates?
Computational Budget: How much time/resources do you have?

Universe Size	Data Quality	Recommended Strategy	Rationale
Small (\<30)	Any	Mean-Variance Max Sharpe	Optimization tractable, can fully exploit return estimates
Medium (30-100)	High	Mean-Variance or Risk Parity	Both feasible, choose based on return confidence
Medium (30-100)	Low	Risk Parity	More robust to estimation error
Large (100-300)	Any	Risk Parity	Mean-variance too slow/unstable
Very Large (300+)	Any	Equal Weight + Preselection	Use preselection to reduce universe first

When to Use Each Strategy¶

Equal Weight:

✓ Benchmark comparison
✓ Unknown/unreliable correlations
✓ Very large universes (after preselection)
✓ Regulatory/policy constraints on optimization
✗ When you have good risk/return estimates
✗ When assets have very different volatilities

Risk Parity:

✓ Want balanced risk exposure
✓ Medium-large universes (30-300 assets)
✓ Distrust return forecasts
✓ Assets with different volatilities
✗ When you have high-confidence return views
✗ Very large universes (>300, unless with preselection)

Mean-Variance:

✓ Have reliable return estimates
✓ Small-medium universes (\<100 assets)
✓ Want theoretically optimal allocation
✓ Can tolerate concentration
✗ Poor quality data
✗ Large universes (computational cost)
✗ Need stable weights across rebalancing

Combining with Preselection¶

For large universes, combine preselection with strategy:

# Large universe → preselect → optimize
python scripts/run_backtest.py risk_parity \
    --preselect-method combined \
    --preselect-top-k 50 \
    # Now only 50 assets go into risk parity optimization

Recommended Combinations:

Equal Weight + Momentum Preselection: Simple factor tilt
Risk Parity + Combined Preselection: Balanced risk on selected universe
Mean-Variance + Low-Vol Preselection: Defensive optimization

See docs/preselection.md for preselection details.

Optimization Troubleshooting¶

Common Issues and Solutions¶

Issue 1: Optimizer Fails to Converge¶

Symptoms:

Error: "Optimization failed" or "No solution found"
Mean-variance returns no weights

Causes:

Ill-conditioned covariance matrix (near-singular)
Infeasible constraints (e.g., min-bond + max-equity impossible to satisfy)
Insufficient data (not enough history)

Solutions:

Reduce universe size: Use preselection to filter to top-K assets

--preselect-method combined --preselect-top-k 50

Relax constraints: Check that constraints are feasible

# Make sure min + max constraints are compatible
--max-equity 0.90 --min-bond 0.10  # Sum = 1.0, feasible

Switch strategy: Try risk parity (more robust) or equal weight (always works)
Check data quality: Verify returns have sufficient history

# Ensure enough data for covariance estimation
--lookback-periods 252  # At least 1 year

Issue 2: Extreme/Concentrated Weights¶

Symptoms:

One or few assets get >50% allocation
Many assets get zero or near-zero weight

Causes:

Mean-variance amplifying estimation errors
High correlation between assets
Extreme return forecasts

Solutions:

Add position limits:

--max-weight 0.15 --min-weight 0.02

Use risk parity instead (naturally more diversified):

--strategy risk_parity

Apply preselection to reduce correlation:

--preselect-method low_vol --preselect-top-k 30

Switch to equal weight if optimization unreliable

Issue 3: Unstable Weights Across Rebalancing¶

Symptoms:

Weights change dramatically between rebalances
High turnover
Same strategy produces very different allocations

Causes:

Mean-variance sensitivity to small data changes
Short lookback window
Noisy return estimates

Solutions:

Use membership policy to stabilize holdings:

--membership-enabled \
--membership-buffer-rank 10 \
--membership-min-hold 3

Increase lookback window:

--lookback-periods 504  # 2 years instead of 1

Switch to risk parity (more stable than mean-variance)
Apply preselection (filters out marginal assets)

Issue 4: Out of Memory¶

Symptoms:

Python crashes with MemoryError
System runs out of RAM during optimization

Causes:

Large universe (>500 assets)
Mean-variance O(n³) memory usage
Long lookback window

Solutions:

Use preselection to reduce universe:

--preselect-method combined --preselect-top-k 50

Switch to risk parity or equal weight (lower memory):
Risk parity: O(n²) instead of O(n³)
Equal weight: O(n)
Reduce lookback window:

--lookback-periods 126  # 6 months instead of 1 year

Run on machine with more RAM or use smaller universe

Issue 5: Negative Weights (Short Positions)¶

Symptoms:

Some assets have negative weights
Portfolio seems to be shorting assets

Causes:

Constraints allow shorting (not default behavior)
Bug in constraint enforcement

Solutions:

Set minimum weight to zero (no shorting):

--min-weight 0.0  # This is the default

Verify constraint enforcement in output
Report issue if negative weights appear despite min-weight=0

Performance Tips¶

For Small Universes (\<50 assets):

All strategies work well
Mean-variance is tractable
Focus on data quality over computational efficiency

For Medium Universes (50-200 assets):

Prefer risk parity over mean-variance
Consider preselection for mean-variance
Enable caching for faster rebalancing

For Large Universes (200+ assets):

Always use preselection first
Prefer equal weight or risk parity
Mean-variance not recommended unless preselected to \<100 assets
Enable caching and membership policy

General Best Practices:

Start with equal weight (baseline)
Test risk parity (usually best balance)
Try mean-variance only if \<100 assets and good data
Always run strategy comparison (--compare) first
Use membership policy in production to control turnover

CLI Reference¶

All CLI parameters for scripts/construct_portfolio.py live in the CLI Reference; this guide focuses on strategy selection, constraints, and usage patterns.

Cardinality Constraints¶

Cardinality constraints limit the number of positions in the portfolio. The toolkit supports two approaches:

Preselection (Current): Use factor-based preselection to filter assets before optimization
See docs/preselection.md for details
Fast, deterministic, factor-driven
No special solver requirements
Optimizer-Integrated (Future): Enforce cardinality within the optimization
See docs/cardinality_constraints.md for design details
Methods: MIQP, heuristics, relaxation
Currently design stubs only

The preselection approach is production-ready and recommended for most use cases. The optimizer-integrated approach provides extension points for future development when commercial solvers or heuristic algorithms are needed.

For more information:

Current implementation: docs/preselection.md
Future design: docs/cardinality_constraints.md

Portfolio Construction Script: construct_portfolio.py¶

Overview¶

Inputs (Prerequisites)¶

Script Products¶

Features in Detail¶

Portfolio Construction Strategies¶

1. Equal Weight (equal_weight)¶

2. Risk Parity (risk_parity)¶

3. Mean-Variance (mean_variance_max_sharpe, mean_variance_min_volatility)¶

Large-Universe Hardening¶

Portfolio Constraints¶

Strategy Comparison Mode¶

Workflow Overview¶

Usage Examples¶

Strategy Selection Guidance¶

Decision Framework¶

When to Use Each Strategy¶

Combining with Preselection¶

Optimization Troubleshooting¶

Common Issues and Solutions¶

Issue 1: Optimizer Fails to Converge¶

Issue 2: Extreme/Concentrated Weights¶

Issue 3: Unstable Weights Across Rebalancing¶

Issue 4: Out of Memory¶

Issue 5: Negative Weights (Short Positions)¶

Performance Tips¶

CLI Reference¶

Cardinality Constraints¶

See Also¶

Portfolio Construction Script: `construct_portfolio.py`¶

1. Equal Weight (`equal_weight`)¶

2. Risk Parity (`risk_parity`)¶

3. Mean-Variance (`mean_variance_max_sharpe`, `mean_variance_min_volatility`)¶