Project Workflow Guide¶

📊 Complete System Documentation¶

For a full, end‑to‑end visualization of the system, see: architecture/COMPLETE_WORKFLOW.md.
For canonical file/CLI contracts across steps, see: architecture/INTERFACE_CONTRACTS.md.

This page is a concise, high‑level guide to running the pipeline via a managed path or step‑by‑step.

End‑to‑End Overview¶

flowchart LR
  A[Prepare tradeable data] --> B{Use Universe Manager?}
  B -- yes --> C[manage_universes: select/classify/returns]
  B -- no --> C2[manual: select -> classify -> returns]
  C --> D[construct_portfolio: weights]
  C2 --> D
  D --> E[run_backtest: reports + CSVs]
  E --> F[visualization: charts/tables]

Overview¶

This document outlines the two primary ways to use the portfolio management toolkit: the Managed Workflow and the Manual Workflow.

The Managed Workflow is the recommended approach for repeatable, consistent analysis. It uses the config/universes.yaml file as a central blueprint and the scripts/manage_universes.py script to orchestrate the data pipeline.

The Manual Workflow involves running each script individually and is best suited for debugging and experimentation.

The Managed Workflow (Recommended)¶

This workflow is designed for efficiency and repeatability.

Phase 1: One-Time Setup¶

These steps only need to be performed once, or whenever your raw data sources change.

Step 1: Prepare Raw Data
Action: Run the scripts/prepare_tradeable_data.py script.
Purpose: To scan all your raw price data and generate the master list of all possible assets the system can work with (tradeable_matches.csv).
Documentation: data_preparation.md
Step 2: Define Your Universes
Action: Manually edit the config/universes.yaml file.
Purpose: To define the rules and parameters for your investment strategies (universes).
Documentation: universes.md

Phase 2: Universe-Driven Pipeline¶

These steps are performed whenever you want to generate data for a universe or run a backtest.

Step 3: Generate Universe Data
Action: Run python scripts/manage_universes.py load <your_universe_name>.
Purpose: This single command reads your YAML file and automatically runs the asset selection, classification, and return calculation steps for you, saving the final returns data to a CSV file.
Documentation: manage_universes.md
Step 4: Construct Portfolio
Action: Run the scripts/construct_portfolio.py script.
Purpose: To take the returns data from the previous step and apply a financial strategy (e.g., Mean-Variance, Risk Parity) to determine the optimal asset weights.
Documentation: portfolio_construction.md
Step 5: Run Backtest
Action: Run the scripts/run_backtest.py script.
Purpose: To simulate the historical performance of your chosen strategy on the universe you prepared, generating a full set of analytics and reports.
Documentation: backtesting.md
Optional: Visualize results and summaries
Action: Use the reporting utilities or your plotting stack
Documentation: visualization.md

Manual (Ad‑Hoc) Workflow¶

For debugging, testing, or one-off experiments, you can run the full suite of scripts individually in sequence. In this workflow, you provide all configuration via command-line arguments and pass the output file from one script as the input to the next.

The sequence is as follows:

Data Preparation — scripts/prepare_tradeable_data.py (data_preparation.md)
Asset Selection — scripts/select_assets.py (asset_selection.md)
Asset Classification — scripts/classify_assets.py (asset_classification.md)
Return Calculation — scripts/calculate_returns.py (calculate_returns.md)
Portfolio Construction — scripts/construct_portfolio.py (portfolio_construction.md)
Backtesting — scripts/run_backtest.py (backtesting.md)
Visualization — reporting utilities (visualization.md)

While this approach offers maximum flexibility, it is not recommended for regular use as it can be tedious and error-prone.

Workflow 3 — Production Optimized¶

For large‑scale production workloads, combine manage_universes.py with automated scheduling, fast I/O backends (polars/pyarrow), and caching layers (statistics caching, incremental resume) to execute the entire pipeline nightly with minimal manual input.

Outputs by Step (Quick Reference)¶

Data Prep → data/metadata/tradeable_matches.csv, data/metadata/unmatched.csv, exported prices dir
Selection → <name>_selected.csv (schema preserves match report columns)
Classification → <name>_classified.csv (adds asset_class/sub_class/geography/confidence)
Returns → <name>_returns.csv (wide matrix, dates as rows)
Construction → weights CSV (or comparison CSV with per‑strategy weights)
Backtest → config.json, summary_report.json, equity_curve.csv, optional trades.csv and viz CSVs

See architecture/INTERFACE_CONTRACTS.md for the canonical schemas and invariants.