Skip to content

Portfolio Construction Script: construct_portfolio.py

Overview

This script is the fifth step in the portfolio management toolkit's data pipeline. This is where financial theory is applied to the historical returns data to create a portfolio.

Its purpose is to take the matrix of asset returns and, based on a chosen investment strategy and a set of user-defined constraints, calculate the optimal weight (or allocation percentage) for each asset.

Like the other scripts, this file acts as a command-line orchestrator for the core logic encapsulated in the PortfolioConstructor class.

Inputs (Prerequisites)

This script requires one primary input and one optional input from previous steps:

  1. Returns Matrix CSV (Required)

  2. Generated by: scripts/calculate_returns.py

  3. Specified via: --returns
  4. Purpose: This file provides the historical return data for all assets, which is the foundation for all quantitative strategies.

  5. Classifications CSV (Optional)

  6. Generated by: scripts/classify_assets.py

  7. Specified via: --classifications
  8. Purpose: This file is only needed if you intend to use constraints based on asset classes, such as --max-equity or --min-bond.

Script Products

The script's output depends on its mode of operation:

  1. Portfolio Weights CSV (Primary Product)

  2. Description: When running with a single strategy, the script produces a CSV file with two columns: one for the asset tickers and one for their calculated weight in the portfolio. This file is the direct input for the run_backtest.py script.

  3. Strategy Comparison CSV

  4. Description: When running with the --compare flag, the script produces a CSV file where the rows are the asset tickers and each column represents a different investment strategy. The values in the table are the weights assigned to each asset by each strategy. This is useful for analysis and comparison.

Features in Detail

Portfolio Construction Strategies

The toolkit implements three well-established portfolio construction strategies, each with different optimization objectives:

1. Equal Weight (equal_weight)

The simplest baseline strategy. Assigns equal weight (1/N) to every asset in the universe.

Characteristics:

  • Optimization objective: None (deterministic 1/N allocation)
  • Computational cost: O(n) - fastest
  • Best for: Small universes, benchmark comparison, when you believe in equal risk contribution without correlation assumptions

Advantages:

  • No estimation error (no parameters to estimate)
  • Robust to outliers and data quality issues
  • No risk of optimization failure
  • Computationally trivial

Disadvantages:

  • Ignores correlations and risk differences between assets
  • Can lead to concentrated risk if assets are highly correlated
  • No consideration of expected returns or volatility

2. Risk Parity (risk_parity)

Allocates capital so that each asset contributes equally to overall portfolio risk (volatility).

Characteristics:

  • Optimization objective: Equal risk contribution from each asset
  • Computational cost: O(n²) - moderate (covariance computation)
  • Best for: Medium to large universes (30-300 assets), when you want balanced risk exposure

Advantages:

  • Diversifies risk exposure evenly across assets
  • Computationally efficient (faster than mean-variance)
  • More stable than mean-variance (uses only volatility, not expected returns)
  • Handles correlations explicitly

Disadvantages:

  • Can overweight low-volatility (often defensive) assets
  • Ignores expected returns entirely
  • May concentrate in bonds/cash during high market volatility

Large-Universe Behavior:

  • Automatically falls back to inverse-volatility when >300 assets
  • Applies covariance matrix stabilization (diagonal jitter) when near-singular
  • All portfolio constraints still enforced in fallback mode

3. Mean-Variance (mean_variance_max_sharpe, mean_variance_min_volatility)

Finds the portfolio with optimal risk-adjusted return using Markowitz's mean-variance optimization.

Two variants:

  • mean_variance_max_sharpe: Maximizes Sharpe ratio (risk-adjusted return)
  • mean_variance_min_volatility: Minimizes portfolio volatility

Characteristics:

  • Optimization objective: Max Sharpe or min volatility
  • Computational cost: O(n³) - slowest (quadratic programming)
  • Best for: Small to medium universes (\<100 assets), when you have confidence in return estimates

Advantages:

  • Theoretically optimal (maximizes expected utility)
  • Explicitly considers expected returns
  • Can target specific risk/return profiles
  • Backed by decades of academic research

Disadvantages:

  • Highly sensitive to estimation errors in expected returns
  • Computationally expensive for large universes
  • May produce extreme/concentrated weights
  • Can be unstable across rebalancing periods

Large-Universe Behavior:

  • Applies covariance shrinkage for numerical stability
  • Falls back to closed-form tangency solution if optimizer fails
  • Sanitizes return inputs before optimization
  • All constraints validated after optimization

Large-Universe Hardening

Running the optimiser across hundreds of tickers can surface numerical edge cases. The strategies now include guard rails so 1,000-name universes stay tractable without hand-tuning:

  • risk_parity automatically falls back to an inverse-volatility solution when more than 300 assets are present, while still enforcing all portfolio constraints, and it stabilises the covariance matrix with a light diagonal “jitter” whenever the inputs are nearly singular.
  • mean_variance_* sanitises return inputs, applies shrinkage to the covariance matrix, and will switch to a closed-form tangency approximation when the universe is extremely large or PyPortfolioOpt struggles to converge. The resulting weights are normalised and validated against the same constraint set as the main optimiser.

These safeguards match the configuration used for the long_history_1000 runs and remove the need for ad-hoc ticker pruning when scaling up experiments.

Portfolio Constraints

Constraints are rules that the final portfolio must obey. They allow you to enforce your own views or risk limits on top of the chosen strategy.

  • --max-weight / --min-weight: Sets the maximum and minimum allowable weight for any single asset.
  • --max-equity: Sets the maximum total allocation to all assets classified as "equity".
  • --min-bond: Sets the minimum total allocation to all assets classified as "fixed_income" or "cash".

Strategy Comparison Mode

Instead of building a single portfolio, you can use the --compare flag. This tells the script to run all available strategies and generate a single table comparing the weights produced by each, making it easy to see how different theories allocate capital to the same set of assets.

Workflow Overview

flowchart LR
    A["Returns matrix CSV\n`--returns`"]
    B["Classifications CSV\n`--classifications` (optional)"]
    C["PortfolioConstructor engine\nstrategy + constraints"]
    D{"Strategy choice"}
    E["Equal Weight strategy"]
    F["Risk Parity strategy"]
    G["Mean-Variance variants"]
    H["Constraints enforcement\nweights filtered by `--max-equity`, etc."]
    I["Outputs\nweights CSV / comparison table"]

    A --> C
    B --> C
    C --> D
    D --> E
    D --> F
    D --> G
    E --> H
    F --> H
    G --> H
    H --> I

The diagram highlights that returns (and optional classifications) feed the constructor, which branches by strategy, applies constraint checks, and emits the final weights/comparison artifacts.

Usage Examples

# Construct a simple equal-weight portfolio
python scripts/construct_portfolio.py \
  --returns data/processed/returns.csv \
  --strategy equal_weight \
  --output outputs/portfolio_equal_weight.csv
# Construct a Mean-Variance portfolio with exposure constraints
python scripts/construct_portfolio.py \
  --returns data/processed/returns.csv \
  --classifications data/processed/classified_assets.csv \
  --strategy mean_variance_max_sharpe \
  --max-equity 0.80 \
  --min-bond 0.15 \
  --output outputs/portfolio_mv.csv
# Compare all available strategies
python scripts/construct_portfolio.py \
  --returns data/processed/returns.csv \
  --compare \
  --output outputs/portfolio_comparison.csv

Strategy Selection Guidance

Decision Framework

Choose your strategy based on three key factors:

  1. Universe Size: How many assets are you investing in?
  2. Data Quality: How reliable are your return estimates?
  3. Computational Budget: How much time/resources do you have?
Universe Size Data Quality Recommended Strategy Rationale
Small (\<30) Any Mean-Variance Max Sharpe Optimization tractable, can fully exploit return estimates
Medium (30-100) High Mean-Variance or Risk Parity Both feasible, choose based on return confidence
Medium (30-100) Low Risk Parity More robust to estimation error
Large (100-300) Any Risk Parity Mean-variance too slow/unstable
Very Large (300+) Any Equal Weight + Preselection Use preselection to reduce universe first

When to Use Each Strategy

Equal Weight:

  • ✓ Benchmark comparison
  • ✓ Unknown/unreliable correlations
  • ✓ Very large universes (after preselection)
  • ✓ Regulatory/policy constraints on optimization
  • ✗ When you have good risk/return estimates
  • ✗ When assets have very different volatilities

Risk Parity:

  • ✓ Want balanced risk exposure
  • ✓ Medium-large universes (30-300 assets)
  • ✓ Distrust return forecasts
  • ✓ Assets with different volatilities
  • ✗ When you have high-confidence return views
  • ✗ Very large universes (>300, unless with preselection)

Mean-Variance:

  • ✓ Have reliable return estimates
  • ✓ Small-medium universes (\<100 assets)
  • ✓ Want theoretically optimal allocation
  • ✓ Can tolerate concentration
  • ✗ Poor quality data
  • ✗ Large universes (computational cost)
  • ✗ Need stable weights across rebalancing

Combining with Preselection

For large universes, combine preselection with strategy:

# Large universe → preselect → optimize
python scripts/run_backtest.py risk_parity \
    --preselect-method combined \
    --preselect-top-k 50 \
    # Now only 50 assets go into risk parity optimization

Recommended Combinations:

  • Equal Weight + Momentum Preselection: Simple factor tilt
  • Risk Parity + Combined Preselection: Balanced risk on selected universe
  • Mean-Variance + Low-Vol Preselection: Defensive optimization

See docs/preselection.md for preselection details.

Optimization Troubleshooting

Common Issues and Solutions

Issue 1: Optimizer Fails to Converge

Symptoms:

  • Error: "Optimization failed" or "No solution found"
  • Mean-variance returns no weights

Causes:

  • Ill-conditioned covariance matrix (near-singular)
  • Infeasible constraints (e.g., min-bond + max-equity impossible to satisfy)
  • Insufficient data (not enough history)

Solutions:

  1. Reduce universe size: Use preselection to filter to top-K assets
--preselect-method combined --preselect-top-k 50
  1. Relax constraints: Check that constraints are feasible
# Make sure min + max constraints are compatible
--max-equity 0.90 --min-bond 0.10  # Sum = 1.0, feasible
  1. Switch strategy: Try risk parity (more robust) or equal weight (always works)

  2. Check data quality: Verify returns have sufficient history

# Ensure enough data for covariance estimation
--lookback-periods 252  # At least 1 year

Issue 2: Extreme/Concentrated Weights

Symptoms:

  • One or few assets get >50% allocation
  • Many assets get zero or near-zero weight

Causes:

  • Mean-variance amplifying estimation errors
  • High correlation between assets
  • Extreme return forecasts

Solutions:

  1. Add position limits:
--max-weight 0.15 --min-weight 0.02
  1. Use risk parity instead (naturally more diversified):
--strategy risk_parity
  1. Apply preselection to reduce correlation:
--preselect-method low_vol --preselect-top-k 30
  1. Switch to equal weight if optimization unreliable

Issue 3: Unstable Weights Across Rebalancing

Symptoms:

  • Weights change dramatically between rebalances
  • High turnover
  • Same strategy produces very different allocations

Causes:

  • Mean-variance sensitivity to small data changes
  • Short lookback window
  • Noisy return estimates

Solutions:

  1. Use membership policy to stabilize holdings:
--membership-enabled \
--membership-buffer-rank 10 \
--membership-min-hold 3
  1. Increase lookback window:
--lookback-periods 504  # 2 years instead of 1
  1. Switch to risk parity (more stable than mean-variance)

  2. Apply preselection (filters out marginal assets)

Issue 4: Out of Memory

Symptoms:

  • Python crashes with MemoryError
  • System runs out of RAM during optimization

Causes:

  • Large universe (>500 assets)
  • Mean-variance O(n³) memory usage
  • Long lookback window

Solutions:

  1. Use preselection to reduce universe:
--preselect-method combined --preselect-top-k 50
  1. Switch to risk parity or equal weight (lower memory):

  2. Risk parity: O(n²) instead of O(n³)

  3. Equal weight: O(n)

  4. Reduce lookback window:

--lookback-periods 126  # 6 months instead of 1 year
  1. Run on machine with more RAM or use smaller universe

Issue 5: Negative Weights (Short Positions)

Symptoms:

  • Some assets have negative weights
  • Portfolio seems to be shorting assets

Causes:

  • Constraints allow shorting (not default behavior)
  • Bug in constraint enforcement

Solutions:

  1. Set minimum weight to zero (no shorting):
--min-weight 0.0  # This is the default
  1. Verify constraint enforcement in output

  2. Report issue if negative weights appear despite min-weight=0

Performance Tips

For Small Universes (\<50 assets):

  • All strategies work well
  • Mean-variance is tractable
  • Focus on data quality over computational efficiency

For Medium Universes (50-200 assets):

  • Prefer risk parity over mean-variance
  • Consider preselection for mean-variance
  • Enable caching for faster rebalancing

For Large Universes (200+ assets):

  • Always use preselection first
  • Prefer equal weight or risk parity
  • Mean-variance not recommended unless preselected to \<100 assets
  • Enable caching and membership policy

General Best Practices:

  1. Start with equal weight (baseline)
  2. Test risk parity (usually best balance)
  3. Try mean-variance only if \<100 assets and good data
  4. Always run strategy comparison (--compare) first
  5. Use membership policy in production to control turnover

CLI Reference

All CLI parameters for scripts/construct_portfolio.py live in the CLI Reference; this guide focuses on strategy selection, constraints, and usage patterns.

Cardinality Constraints

Cardinality constraints limit the number of positions in the portfolio. The toolkit supports two approaches:

  1. Preselection (Current): Use factor-based preselection to filter assets before optimization

  2. See docs/preselection.md for details

  3. Fast, deterministic, factor-driven
  4. No special solver requirements

  5. Optimizer-Integrated (Future): Enforce cardinality within the optimization

  6. See docs/cardinality_constraints.md for design details

  7. Methods: MIQP, heuristics, relaxation
  8. Currently design stubs only

The preselection approach is production-ready and recommended for most use cases. The optimizer-integrated approach provides extension points for future development when commercial solvers or heuristic algorithms are needed.

For more information:

  • Current implementation: docs/preselection.md
  • Future design: docs/cardinality_constraints.md

See Also