Interface Contracts (Canonical)¶

This page is the single source of truth for CLI and file-schema contracts across the pipeline. Changes here require updating dependent docs, scripts, and tests in the same change.

Last updated: 2025-11-10

Data Preparation — `scripts/prepare_tradeable_data.py`¶

Inputs
Stooq tree at --data-dir with standard .txt files (dates in YYYYMMDD).
Tradeable CSVs dir --tradeable-dir containing CSVs with required columns: symbol, isin, name, market, currency.
CLI flags (stable set)
I/O: --data-dir, --tradeable-dir, --metadata-output, --match-report, --unmatched-report, --prices-output.
Behavior: --force-reindex, --overwrite-prices, --include-empty-prices, --lse-currency-policy {broker,stooq,strict}.
Incremental: --incremental, --cache-metadata (skip work if inputs unchanged and both reports exist).
Outputs and schemas
Exported prices directory: one <ticker>.csv (lowercased) per match; header ticker,per,date,time,open,high,low,close,volume,openint; rows mirror Stooq; dates YYYYMMDD.
Match report CSV columns (exact order): symbol, isin, market, name, currency, matched_ticker, stooq_path, region, category, strategy, source_file, price_start, price_end, price_rows, inferred_currency, resolved_currency, currency_status, data_status, data_flags.
Unmatched report CSV columns: symbol, isin, market, name, currency, source_file, reason.
Stooq index CSV columns: ticker, stem, relative_path, region, category.
Invariants
Deduplicate by ticker on export; first match wins.
Skip export when data_status in {empty, missing, missing_file} or startswith error: unless --include-empty-prices.
data_status ∈ {ok, warning, sparse, empty, missing, missing_file, error:*}; data_flags is semicolon‑delimited (e.g., zero_volume_severity=high).
Primary consumers
Asset Selection and scripts/select_assets.py (filters by data_status, data_flags, requires price_start/end/rows).
Universe Manager (uses match report + --prices-output directory to compute returns).
PriceLoader (expects exported CSV header as above).

Asset Selection — `scripts/select_assets.py`¶

Inputs
tradeable_matches.csv with columns: symbol, isin, market, name, currency, matched_ticker, stooq_path, region, category, strategy, source_file, price_start, price_end, price_rows, inferred_currency, resolved_currency, currency_status, data_status, data_flags.
Optional allowlist/blocklist files (one symbol/ISIN per line).

The canonical schema is tradeable_matches.csv with columns: symbol, isin, market, name, currency, matched_ticker, stooq_path, region, category, strategy, source_file, price_start, price_end, price_rows, inferred_currency, resolved_currency, currency_status, data_status, data_flags.

CLI flags (stable set)
Core filters: --match-report, --output, --data-status, --min-history-days, --min-price-rows, --max-gap-days.
Additional filters: --severity, --markets, --regions, --currencies.
Overrides: --allowlist, --blocklist, --dry-run, --chunk-size.
Operational: --verbose for logging.
Outputs
Selected assets CSV written to --output, preserving the match report schema but only containing assets that passed the filters.
Console summary when --dry-run is set (includes counts and breakdowns by market/region).
Streaming invariants
--chunk-size > 0 enables streaming mode (process_chunked), bounding memory usage and assuring allowlist validation across chunks.
Without --chunk-size, entire match report is loaded and piped through AssetSelector.select_assets() (traditional behavior).
Primary consumers
scripts/classify_assets.py and the universe loaders expect the selected assets CSV plus its data_status/data_flags diagnostics.
Downstream builders rely on price_start, price_end, and price_rows for history assertions and constraints.

Asset Classification — `scripts/classify_assets.py`¶

Inputs
Selected assets CSV produced by select_assets.py.
Optional overrides CSV keyed by symbol/isin.
CLI flags (stable set)
--input, --output, --overrides, --export-for-review, --summary, --verbose.
Outputs
Classified assets CSV at --output, contains original columns plus asset_class, sub_class, geography, confidence.
Optional review template (--export-for-review) for manual overrides.
Console summary (--summary) reporting class/geography breakdown and low-confidence rows.
Invariants
Manual overrides take precedence and propagate a confidence of 1.0.
The classifier appends at least asset_class, sub_class, geography, and confidence columns for every asset.
Primary consumers
calculate_returns.py expects the classified assets schema plus diagnostics for filtering/history coverage.
Override artifacts feed future classification runs or documentation.

Returns — `scripts/calculate_returns.py`¶

Inputs
Classified assets CSV (required) with price_rows, stooq_path, and coverage metadata.
Cleaned price CSV directory (--prices-dir) from data preparation.
CLI flags (stable set)
--assets, --prices-dir, --output, --method, --frequency, --handle-missing, --align-method, --cache-size, --loader-workers, --io-backend.
Tuning/logging: --risk-free-rate, --max-forward-fill, --min-periods, --min-coverage, --business-days, --summary, --top, --verbose.
Outputs
Returns Matrix CSV (wide format) with dates as rows and asset tickers as columns at --output.
Optional textual summary when --summary is provided.
Invariants
Alignment occurs via --align-method (outer/inner) before missing-data handling.
Coverage filtering removes assets failing --min-coverage.
Cache size influences memory/performance; 0 disables caching.
Primary consumers
Portfolio construction scripts expect the returns matrix and any summary logs for diagnostics.

Universe Management — `scripts/manage_universes.py`¶

Inputs
config/universes.yaml (required) defining filter_criteria, classification_requirements, return_config, constraints, and optional advanced blocks.
data/metadata/tradeable_matches.csv and data/processed/tradeable_prices for selection/classification/returns when loading a universe.

The config/universes.yaml file is the central blueprint that streams filters, classification requirements, return configs, and constraint blocks into the service pipeline.

CLI commands (stable)
Legacy: list, show, load, validate, compare.
Service commands: validate-file, compare-files, plus load <name> (legacy) orchestrating asset selection/classification/returns pipeline.
Shared flags: --config, --matches, --prices-dir, --verbose, --output-dir, --status.
Outputs
Execution logs plus selected/classified/returns artifacts under the optional --output-dir.
Validation/comparison reports printed to console when running validate-file/compare-files.
Invariants
Universe definitions must include core blocks before advanced ones; validation ensures required keys exist.
Loading a universe applies selection → classification → returns in order, honoring the config blocks.
Primary consumers
Portfolio construction/backtesting scripts rely on the universes produced by the service (selected set, classification metadata, returns).

Portfolio Construction — `scripts/construct_portfolio.py`¶

Inputs
Returns matrix CSV (--returns).
Optional classifications CSV for constraint-aware strategies (--classifications).
CLI flags (stable)
--strategy (equal_weight/risk_parity/mean_variance variants), --returns, --output.
Constraints: --max-weight, --min-weight, --max-equity, --min-bond, etc.
Comparison: --compare, --strategy overrides.
Outputs
Single strategy weights CSV (--output).
Portfolio Weights CSV is the direct input for the run_backtest flow.
Comparison CSV with per-strategy weights when --compare is supplied.
Invariants
Constraints validated post-optimization; fallback to safer defaults on failure.
Weight columns normalized to sum to 1 (100% allocation).
Primary consumers
run_backtest.py consumes the weights CSV to simulate trades.
Reporting/visualization expect each strategy’s weight distribution for comparison tables.

Backtesting — `scripts/run_backtest.py`¶

Inputs
Strategy name plus returns/prices/universe inputs (--returns-file, --prices-file, --universe-file).
Capital/config (--initial-capital, --start-date, --end-date).
CLI flags/key behaviors
Rebalance & execution: --rebalance-frequency, --commission, --slippage, --rebalance-method.
Advanced filters: --preselect-method, --use-pit-eligibility, --membership-enabled.
Outputs control: --output-dir, --save-trades, --no-visualize.
Logging: --verbose, --dry-run (if exists).
Outputs
Configuration/metrics JSON (config.json, summary_report.json, metrics.json).
Core CSVs: equity_curve.csv (essential for visualization), trades.csv, diverse viz CSVs.
Optional visual artifacts: rolling metrics, drawdowns, cost breakdowns.
Invariants
Sequence: load universe → optionally apply PIT/preselection/membership → execute strategy → simulate trades with transaction costs.
All outputs namespaced under --output-dir.
Primary consumers
Visualization/reporting modules consume the CSV outputs.
Governance/tracking systems rely on config.json + metrics for review.

Visualization & Reporting — `src/portfolio_management/reporting/visualization/`¶

Inputs
Backtest result objects (equity curve, rebalance events, trades, metrics).
API contract
Functions like prepare_equity_curve, prepare_drawdown_series, create_summary_report, prepare_allocation_history return pandas data structures ready for plotting.
Expect equity/trade DataFrames indexed by date with columns equity, trades, returns.
Outputs
DataFrames/CSV-ready dictionaries for normalized equity, drawdowns, summaries, allocations, comparisons, transaction costs, and rolling metrics.
Visualization module produces chart-ready artifacts for docs, dashboards, and notebooks.
Used by docs/examples and downstream CLI helpers for rendering.
Invariants
Equity/drawdown helpers always return time-series with aligned dates.
Summary/report helpers include standard fields (Sharpe, volatility, max drawdown, turnover) for parity with summary_report.json.
Primary consumers
Returns the data for MkDocs visualizations, dashboards, and executive reports.
Downstream CLI or notebook workflows can pipe these into matplotlib/plotly charts.

Universe Management — `assets.universes.UniverseManager`¶

Inputs
config/universes.yaml (definitions, constraints, return config)
Match report DataFrame
Exported prices directory from Data Prep
Output: dict[str, Any] with keys assets (DataFrame), classifications (DataFrame), returns (DataFrame), metadata (Series).

Returns — `analytics.returns.PriceLoader`¶

Input contract: price CSVs under the prices directory with the normalized header ticker,per,date,time,open,high,low,close,volume,openint and dates parsable to a time index.
Output: aligned price DataFrame or Series with close values; filters non‑positive closes and de‑duplicates dates.

Change Policy¶

Any change to flag names, required columns, or CSV schemas above is a breaking change.
Required steps when changing contracts:
Update this page.
Update CLI --help strings and user docs that reference the interface.
Update dependent tests (tests/scripts, tests/integration).
Note the change in memory-bank/progress.md and, if applicable, add an ADR.

Interface Contracts (Canonical)¶

Data Preparation — scripts/prepare_tradeable_data.py¶

Asset Selection — scripts/select_assets.py¶

Asset Classification — scripts/classify_assets.py¶

Returns — scripts/calculate_returns.py¶

Universe Management — scripts/manage_universes.py¶

Portfolio Construction — scripts/construct_portfolio.py¶

Backtesting — scripts/run_backtest.py¶

Visualization & Reporting — src/portfolio_management/reporting/visualization/¶

Universe Management — assets.universes.UniverseManager¶

Returns — analytics.returns.PriceLoader¶