Interface Contracts (Canonical)¶
This page is the single source of truth for CLI and file-schema contracts across the pipeline. Changes here require updating dependent docs, scripts, and tests in the same change.
Last updated: 2025-11-10
Data Preparation — scripts/prepare_tradeable_data.py¶
-
Inputs
-
Stooq tree at
--data-dirwith standard.txtfiles (dates inYYYYMMDD). -
Tradeable CSVs dir
--tradeable-dircontaining CSVs with required columns:symbol, isin, name, market, currency. -
CLI flags (stable set)
-
I/O:
--data-dir,--tradeable-dir,--metadata-output,--match-report,--unmatched-report,--prices-output. - Behavior:
--force-reindex,--overwrite-prices,--include-empty-prices,--lse-currency-policy {broker,stooq,strict}. -
Incremental:
--incremental,--cache-metadata(skip work if inputs unchanged and both reports exist). -
Outputs and schemas
-
Exported prices directory: one
<ticker>.csv(lowercased) per match; headerticker,per,date,time,open,high,low,close,volume,openint; rows mirror Stooq; datesYYYYMMDD. - Match report CSV columns (exact order):
symbol, isin, market, name, currency, matched_ticker, stooq_path, region, category, strategy, source_file, price_start, price_end, price_rows, inferred_currency, resolved_currency, currency_status, data_status, data_flags. - Unmatched report CSV columns:
symbol, isin, market, name, currency, source_file, reason. -
Stooq index CSV columns:
ticker, stem, relative_path, region, category. -
Invariants
-
Deduplicate by ticker on export; first match wins.
- Skip export when
data_status in {empty, missing, missing_file}or startswitherror:unless--include-empty-prices. -
data_status∈{ok, warning, sparse, empty, missing, missing_file, error:*};data_flagsis semicolon‑delimited (e.g.,zero_volume_severity=high). -
Primary consumers
-
Asset Selection and
scripts/select_assets.py(filters bydata_status,data_flags, requiresprice_start/end/rows). - Universe Manager (uses match report +
--prices-outputdirectory to compute returns). - PriceLoader (expects exported CSV header as above).
Asset Selection — scripts/select_assets.py¶
-
Inputs
-
tradeable_matches.csvwith columns:symbol, isin, market, name, currency, matched_ticker, stooq_path, region, category, strategy, source_file, price_start, price_end, price_rows, inferred_currency, resolved_currency, currency_status, data_status, data_flags. - Optional allowlist/blocklist files (one symbol/ISIN per line).
The canonical schema is tradeable_matches.csv with columns: symbol, isin, market, name, currency, matched_ticker, stooq_path, region, category, strategy, source_file, price_start, price_end, price_rows, inferred_currency, resolved_currency, currency_status, data_status, data_flags.
-
CLI flags (stable set)
-
Core filters:
--match-report,--output,--data-status,--min-history-days,--min-price-rows,--max-gap-days. - Additional filters:
--severity,--markets,--regions,--currencies. - Overrides:
--allowlist,--blocklist,--dry-run,--chunk-size. -
Operational:
--verbosefor logging. -
Outputs
-
Selected assets CSV written to
--output, preserving the match report schema but only containing assets that passed the filters. -
Console summary when
--dry-runis set (includes counts and breakdowns by market/region). -
Streaming invariants
-
--chunk-size> 0 enables streaming mode (process_chunked), bounding memory usage and assuring allowlist validation across chunks. -
Without
--chunk-size, entire match report is loaded and piped throughAssetSelector.select_assets()(traditional behavior). -
Primary consumers
-
scripts/classify_assets.pyand the universe loaders expect the selected assets CSV plus itsdata_status/data_flagsdiagnostics. - Downstream builders rely on
price_start,price_end, andprice_rowsfor history assertions and constraints.
Asset Classification — scripts/classify_assets.py¶
-
Inputs
-
Selected assets CSV produced by
select_assets.py. -
Optional overrides CSV keyed by
symbol/isin. -
CLI flags (stable set)
-
--input,--output,--overrides,--export-for-review,--summary,--verbose. -
Outputs
-
Classified assets CSV at
--output, contains original columns plusasset_class,sub_class,geography,confidence. - Optional review template (
--export-for-review) for manual overrides. -
Console summary (
--summary) reporting class/geography breakdown and low-confidence rows. -
Invariants
-
Manual overrides take precedence and propagate a
confidenceof1.0. -
The classifier appends at least
asset_class,sub_class,geography, andconfidencecolumns for every asset. -
Primary consumers
-
calculate_returns.pyexpects the classified assets schema plus diagnostics for filtering/history coverage. - Override artifacts feed future classification runs or documentation.
Returns — scripts/calculate_returns.py¶
-
Inputs
-
Classified assets CSV (required) with
price_rows,stooq_path, and coverage metadata. -
Cleaned price CSV directory (
--prices-dir) from data preparation. -
CLI flags (stable set)
-
--assets,--prices-dir,--output,--method,--frequency,--handle-missing,--align-method,--cache-size,--loader-workers,--io-backend. -
Tuning/logging:
--risk-free-rate,--max-forward-fill,--min-periods,--min-coverage,--business-days,--summary,--top,--verbose. -
Outputs
-
Returns Matrix CSV (wide format) with dates as rows and asset tickers as columns at
--output. -
Optional textual summary when
--summaryis provided. -
Invariants
-
Alignment occurs via
--align-method(outer/inner) before missing-data handling. - Coverage filtering removes assets failing
--min-coverage. -
Cache size influences memory/performance;
0disables caching. -
Primary consumers
-
Portfolio construction scripts expect the returns matrix and any summary logs for diagnostics.
Universe Management — scripts/manage_universes.py¶
-
Inputs
-
config/universes.yaml(required) definingfilter_criteria,classification_requirements,return_config,constraints, and optional advanced blocks. data/metadata/tradeable_matches.csvanddata/processed/tradeable_pricesfor selection/classification/returns when loading a universe.
The config/universes.yaml file is the central blueprint that streams filters, classification requirements, return configs, and constraint blocks into the service pipeline.
-
CLI commands (stable)
-
Legacy:
list,show,load,validate,compare. - Service commands:
validate-file,compare-files, plusload <name>(legacy) orchestrating asset selection/classification/returns pipeline. -
Shared flags:
--config,--matches,--prices-dir,--verbose,--output-dir,--status. -
Outputs
-
Execution logs plus selected/classified/returns artifacts under the optional
--output-dir. -
Validation/comparison reports printed to console when running
validate-file/compare-files. -
Invariants
-
Universe definitions must include core blocks before advanced ones; validation ensures required keys exist.
-
Loading a universe applies selection → classification → returns in order, honoring the config blocks.
-
Primary consumers
-
Portfolio construction/backtesting scripts rely on the universes produced by the service (selected set, classification metadata, returns).
Portfolio Construction — scripts/construct_portfolio.py¶
-
Inputs
-
Returns matrix CSV (
--returns). -
Optional classifications CSV for constraint-aware strategies (
--classifications). -
CLI flags (stable)
-
--strategy(equal_weight/risk_parity/mean_variance variants),--returns,--output. - Constraints:
--max-weight,--min-weight,--max-equity,--min-bond, etc. -
Comparison:
--compare,--strategyoverrides. -
Outputs
-
Single strategy weights CSV (
--output). - Portfolio Weights CSV is the direct input for the
run_backtestflow. -
Comparison CSV with per-strategy weights when
--compareis supplied. -
Invariants
-
Constraints validated post-optimization; fallback to safer defaults on failure.
-
Weight columns normalized to sum to 1 (100% allocation).
-
Primary consumers
-
run_backtest.pyconsumes the weights CSV to simulate trades. - Reporting/visualization expect each strategy’s weight distribution for comparison tables.
Backtesting — scripts/run_backtest.py¶
-
Inputs
-
Strategy name plus returns/prices/universe inputs (
--returns-file,--prices-file,--universe-file). -
Capital/config (
--initial-capital,--start-date,--end-date). -
CLI flags/key behaviors
-
Rebalance & execution:
--rebalance-frequency,--commission,--slippage,--rebalance-method. - Advanced filters:
--preselect-method,--use-pit-eligibility,--membership-enabled. - Outputs control:
--output-dir,--save-trades,--no-visualize. -
Logging:
--verbose,--dry-run(if exists). -
Outputs
-
Configuration/metrics JSON (
config.json,summary_report.json,metrics.json). - Core CSVs:
equity_curve.csv(essential for visualization),trades.csv, diverse viz CSVs. -
Optional visual artifacts: rolling metrics, drawdowns, cost breakdowns.
-
Invariants
-
Sequence: load universe → optionally apply PIT/preselection/membership → execute strategy → simulate trades with transaction costs.
-
All outputs namespaced under
--output-dir. -
Primary consumers
-
Visualization/reporting modules consume the CSV outputs.
- Governance/tracking systems rely on
config.json+ metrics for review.
Visualization & Reporting — src/portfolio_management/reporting/visualization/¶
-
Inputs
-
Backtest result objects (equity curve, rebalance events, trades, metrics).
-
API contract
-
Functions like
prepare_equity_curve,prepare_drawdown_series,create_summary_report,prepare_allocation_historyreturn pandas data structures ready for plotting. -
Expect equity/trade DataFrames indexed by date with columns
equity,trades,returns. -
Outputs
-
DataFrames/CSV-ready dictionaries for normalized equity, drawdowns, summaries, allocations, comparisons, transaction costs, and rolling metrics.
- Visualization module produces chart-ready artifacts for docs, dashboards, and notebooks.
-
Used by docs/examples and downstream CLI helpers for rendering.
-
Invariants
-
Equity/drawdown helpers always return time-series with aligned dates.
-
Summary/report helpers include standard fields (Sharpe, volatility, max drawdown, turnover) for parity with
summary_report.json. -
Primary consumers
-
Returns the data for MkDocs visualizations, dashboards, and executive reports.
- Downstream CLI or notebook workflows can pipe these into matplotlib/plotly charts.
Universe Management — assets.universes.UniverseManager¶
- Inputs
config/universes.yaml(definitions, constraints, return config)- Match report DataFrame
- Exported prices directory from Data Prep
- Output:
dict[str, Any]with keysassets(DataFrame),classifications(DataFrame),returns(DataFrame),metadata(Series).
Returns — analytics.returns.PriceLoader¶
- Input contract: price CSVs under the prices directory with the normalized header
ticker,per,date,time,open,high,low,close,volume,openintand dates parsable to a time index. - Output: aligned price DataFrame or Series with
closevalues; filters non‑positive closes and de‑duplicates dates.
Change Policy¶
- Any change to flag names, required columns, or CSV schemas above is a breaking change.
- Required steps when changing contracts:
- Update this page.
- Update CLI
--helpstrings and user docs that reference the interface. - Update dependent tests (
tests/scripts,tests/integration). - Note the change in
memory-bank/progress.mdand, if applicable, add an ADR.