Skip to content

Interface Contracts (Canonical)

This page is the single source of truth for CLI and file-schema contracts across the pipeline. Changes here require updating dependent docs, scripts, and tests in the same change.

Last updated: 2025-11-10

Data Preparation — scripts/prepare_tradeable_data.py

  • Inputs

  • Stooq tree at --data-dir with standard .txt files (dates in YYYYMMDD).

  • Tradeable CSVs dir --tradeable-dir containing CSVs with required columns: symbol, isin, name, market, currency.

  • CLI flags (stable set)

  • I/O: --data-dir, --tradeable-dir, --metadata-output, --match-report, --unmatched-report, --prices-output.

  • Behavior: --force-reindex, --overwrite-prices, --include-empty-prices, --lse-currency-policy {broker,stooq,strict}.
  • Incremental: --incremental, --cache-metadata (skip work if inputs unchanged and both reports exist).

  • Outputs and schemas

  • Exported prices directory: one <ticker>.csv (lowercased) per match; header ticker,per,date,time,open,high,low,close,volume,openint; rows mirror Stooq; dates YYYYMMDD.

  • Match report CSV columns (exact order): symbol, isin, market, name, currency, matched_ticker, stooq_path, region, category, strategy, source_file, price_start, price_end, price_rows, inferred_currency, resolved_currency, currency_status, data_status, data_flags.
  • Unmatched report CSV columns: symbol, isin, market, name, currency, source_file, reason.
  • Stooq index CSV columns: ticker, stem, relative_path, region, category.

  • Invariants

  • Deduplicate by ticker on export; first match wins.

  • Skip export when data_status in {empty, missing, missing_file} or startswith error: unless --include-empty-prices.
  • data_status{ok, warning, sparse, empty, missing, missing_file, error:*}; data_flags is semicolon‑delimited (e.g., zero_volume_severity=high).

  • Primary consumers

  • Asset Selection and scripts/select_assets.py (filters by data_status, data_flags, requires price_start/end/rows).

  • Universe Manager (uses match report + --prices-output directory to compute returns).
  • PriceLoader (expects exported CSV header as above).

Asset Selection — scripts/select_assets.py

  • Inputs

  • tradeable_matches.csv with columns: symbol, isin, market, name, currency, matched_ticker, stooq_path, region, category, strategy, source_file, price_start, price_end, price_rows, inferred_currency, resolved_currency, currency_status, data_status, data_flags.

  • Optional allowlist/blocklist files (one symbol/ISIN per line).

The canonical schema is tradeable_matches.csv with columns: symbol, isin, market, name, currency, matched_ticker, stooq_path, region, category, strategy, source_file, price_start, price_end, price_rows, inferred_currency, resolved_currency, currency_status, data_status, data_flags.

  • CLI flags (stable set)

  • Core filters: --match-report, --output, --data-status, --min-history-days, --min-price-rows, --max-gap-days.

  • Additional filters: --severity, --markets, --regions, --currencies.
  • Overrides: --allowlist, --blocklist, --dry-run, --chunk-size.
  • Operational: --verbose for logging.

  • Outputs

  • Selected assets CSV written to --output, preserving the match report schema but only containing assets that passed the filters.

  • Console summary when --dry-run is set (includes counts and breakdowns by market/region).

  • Streaming invariants

  • --chunk-size > 0 enables streaming mode (process_chunked), bounding memory usage and assuring allowlist validation across chunks.

  • Without --chunk-size, entire match report is loaded and piped through AssetSelector.select_assets() (traditional behavior).

  • Primary consumers

  • scripts/classify_assets.py and the universe loaders expect the selected assets CSV plus its data_status/data_flags diagnostics.

  • Downstream builders rely on price_start, price_end, and price_rows for history assertions and constraints.

Asset Classification — scripts/classify_assets.py

  • Inputs

  • Selected assets CSV produced by select_assets.py.

  • Optional overrides CSV keyed by symbol/isin.

  • CLI flags (stable set)

  • --input, --output, --overrides, --export-for-review, --summary, --verbose.

  • Outputs

  • Classified assets CSV at --output, contains original columns plus asset_class, sub_class, geography, confidence.

  • Optional review template (--export-for-review) for manual overrides.
  • Console summary (--summary) reporting class/geography breakdown and low-confidence rows.

  • Invariants

  • Manual overrides take precedence and propagate a confidence of 1.0.

  • The classifier appends at least asset_class, sub_class, geography, and confidence columns for every asset.

  • Primary consumers

  • calculate_returns.py expects the classified assets schema plus diagnostics for filtering/history coverage.

  • Override artifacts feed future classification runs or documentation.

Returns — scripts/calculate_returns.py

  • Inputs

  • Classified assets CSV (required) with price_rows, stooq_path, and coverage metadata.

  • Cleaned price CSV directory (--prices-dir) from data preparation.

  • CLI flags (stable set)

  • --assets, --prices-dir, --output, --method, --frequency, --handle-missing, --align-method, --cache-size, --loader-workers, --io-backend.

  • Tuning/logging: --risk-free-rate, --max-forward-fill, --min-periods, --min-coverage, --business-days, --summary, --top, --verbose.

  • Outputs

  • Returns Matrix CSV (wide format) with dates as rows and asset tickers as columns at --output.

  • Optional textual summary when --summary is provided.

  • Invariants

  • Alignment occurs via --align-method (outer/inner) before missing-data handling.

  • Coverage filtering removes assets failing --min-coverage.
  • Cache size influences memory/performance; 0 disables caching.

  • Primary consumers

  • Portfolio construction scripts expect the returns matrix and any summary logs for diagnostics.

Universe Management — scripts/manage_universes.py

  • Inputs

  • config/universes.yaml (required) defining filter_criteria, classification_requirements, return_config, constraints, and optional advanced blocks.

  • data/metadata/tradeable_matches.csv and data/processed/tradeable_prices for selection/classification/returns when loading a universe.

The config/universes.yaml file is the central blueprint that streams filters, classification requirements, return configs, and constraint blocks into the service pipeline.

  • CLI commands (stable)

  • Legacy: list, show, load, validate, compare.

  • Service commands: validate-file, compare-files, plus load <name> (legacy) orchestrating asset selection/classification/returns pipeline.
  • Shared flags: --config, --matches, --prices-dir, --verbose, --output-dir, --status.

  • Outputs

  • Execution logs plus selected/classified/returns artifacts under the optional --output-dir.

  • Validation/comparison reports printed to console when running validate-file/compare-files.

  • Invariants

  • Universe definitions must include core blocks before advanced ones; validation ensures required keys exist.

  • Loading a universe applies selection → classification → returns in order, honoring the config blocks.

  • Primary consumers

  • Portfolio construction/backtesting scripts rely on the universes produced by the service (selected set, classification metadata, returns).

Portfolio Construction — scripts/construct_portfolio.py

  • Inputs

  • Returns matrix CSV (--returns).

  • Optional classifications CSV for constraint-aware strategies (--classifications).

  • CLI flags (stable)

  • --strategy (equal_weight/risk_parity/mean_variance variants), --returns, --output.

  • Constraints: --max-weight, --min-weight, --max-equity, --min-bond, etc.
  • Comparison: --compare, --strategy overrides.

  • Outputs

  • Single strategy weights CSV (--output).

  • Portfolio Weights CSV is the direct input for the run_backtest flow.
  • Comparison CSV with per-strategy weights when --compare is supplied.

  • Invariants

  • Constraints validated post-optimization; fallback to safer defaults on failure.

  • Weight columns normalized to sum to 1 (100% allocation).

  • Primary consumers

  • run_backtest.py consumes the weights CSV to simulate trades.

  • Reporting/visualization expect each strategy’s weight distribution for comparison tables.

Backtesting — scripts/run_backtest.py

  • Inputs

  • Strategy name plus returns/prices/universe inputs (--returns-file, --prices-file, --universe-file).

  • Capital/config (--initial-capital, --start-date, --end-date).

  • CLI flags/key behaviors

  • Rebalance & execution: --rebalance-frequency, --commission, --slippage, --rebalance-method.

  • Advanced filters: --preselect-method, --use-pit-eligibility, --membership-enabled.
  • Outputs control: --output-dir, --save-trades, --no-visualize.
  • Logging: --verbose, --dry-run (if exists).

  • Outputs

  • Configuration/metrics JSON (config.json, summary_report.json, metrics.json).

  • Core CSVs: equity_curve.csv (essential for visualization), trades.csv, diverse viz CSVs.
  • Optional visual artifacts: rolling metrics, drawdowns, cost breakdowns.

  • Invariants

  • Sequence: load universe → optionally apply PIT/preselection/membership → execute strategy → simulate trades with transaction costs.

  • All outputs namespaced under --output-dir.

  • Primary consumers

  • Visualization/reporting modules consume the CSV outputs.

  • Governance/tracking systems rely on config.json + metrics for review.

Visualization & Reporting — src/portfolio_management/reporting/visualization/

  • Inputs

  • Backtest result objects (equity curve, rebalance events, trades, metrics).

  • API contract

  • Functions like prepare_equity_curve, prepare_drawdown_series, create_summary_report, prepare_allocation_history return pandas data structures ready for plotting.

  • Expect equity/trade DataFrames indexed by date with columns equity, trades, returns.

  • Outputs

  • DataFrames/CSV-ready dictionaries for normalized equity, drawdowns, summaries, allocations, comparisons, transaction costs, and rolling metrics.

  • Visualization module produces chart-ready artifacts for docs, dashboards, and notebooks.
  • Used by docs/examples and downstream CLI helpers for rendering.

  • Invariants

  • Equity/drawdown helpers always return time-series with aligned dates.

  • Summary/report helpers include standard fields (Sharpe, volatility, max drawdown, turnover) for parity with summary_report.json.

  • Primary consumers

  • Returns the data for MkDocs visualizations, dashboards, and executive reports.

  • Downstream CLI or notebook workflows can pipe these into matplotlib/plotly charts.

Universe Management — assets.universes.UniverseManager

  • Inputs
  • config/universes.yaml (definitions, constraints, return config)
  • Match report DataFrame
  • Exported prices directory from Data Prep
  • Output: dict[str, Any] with keys assets (DataFrame), classifications (DataFrame), returns (DataFrame), metadata (Series).

Returns — analytics.returns.PriceLoader

  • Input contract: price CSVs under the prices directory with the normalized header ticker,per,date,time,open,high,low,close,volume,openint and dates parsable to a time index.
  • Output: aligned price DataFrame or Series with close values; filters non‑positive closes and de‑duplicates dates.

Change Policy

  • Any change to flag names, required columns, or CSV schemas above is a breaking change.
  • Required steps when changing contracts:
  • Update this page.
  • Update CLI --help strings and user docs that reference the interface.
  • Update dependent tests (tests/scripts, tests/integration).
  • Note the change in memory-bank/progress.md and, if applicable, add an ADR.