Architecture Documentation¶
This directory contains comprehensive architecture and workflow documentation for the Portfolio Management Toolkit.
š Core Documents¶
COMPLETE_WORKFLOW.md¶
The definitive reference for understanding the entire system.
This document contains:
- Complete Mermaid workflow diagram showing all data flows
- Detailed descriptions of every component
- All feature integrations and data paths
- CLI command reference
- Examples and use cases
- Troubleshooting guide
š Start here for comprehensive system understanding.
INTERFACE_CONTRACTS.md¶
Canonical interface and schema contracts for all pipeline stages (CLI flags, CSV schemas, invariants, and consumers). Treat this as the single source of truth for interfaces.
Architecture Overview¶
System Type¶
Modular Monolith - Single codebase with clear module boundaries
Design Principles¶
-
Offline-First
-
Works with cached data
- No external API dependencies during execution
-
Reproducible workflows
-
Modular Pipeline
-
Each stage is independent and composable
- Clear input/output contracts
-
Can be run individually or orchestrated
-
Configuration-Driven
-
YAML-based universe definitions
- CLI flags for runtime parameters
-
Version-controlled configurations
-
Production-Ready
-
200+ automated tests
- Comprehensive error handling
- Performance optimized (caching, vectorization)
- Defensive validation
Core Workflow Stages¶
CSV Files ā Data Prep ā Selection ā Classification ā Returns ā Portfolio ā Backtest ā Visualization
Detailed Breakdown:
-
Data Preparation (
prepare_tradeable_data.py) -
Ingest Stooq CSV files
- Match instruments across venues
- Validate data quality (9+ flags)
-
Features: Incremental resume, fast I/O
-
Asset Selection (
select_assets.py) -
Filter by liquidity, price, market cap
- Apply allow/block lists
-
Optional: Factor preselection
-
Asset Classification (
classify_assets.py) -
Geographic classification
- Asset type classification
-
Override support for corrections
-
Return Calculation (
calculate_returns.py) -
Compute log or simple returns
- Handle missing data
-
Ensure point-in-time integrity
-
Universe Management (
manage_universes.py) -
Define universes in YAML
- Orchestrate pipeline stages
-
Validate configurations
-
Portfolio Construction (
construct_portfolio.py) -
Three strategies: Equal Weight, Risk Parity, Mean-Variance
- Apply constraints (weights, asset classes)
-
Optional: Statistics caching
-
Backtesting (
run_backtest.py) -
Simulate historical performance
- Model transaction costs
- Optional: PIT eligibility, preselection, membership policy
-
Generate comprehensive results
-
Visualization & Reporting
-
Equity curves, drawdowns, distributions
- Performance metrics
- Interactive HTML dashboards
Advanced Features¶
Performance Optimization:
- Incremental resume (3-5 min ā 2-3 sec)
- Fast I/O with Polars/PyArrow (2-5Ć speedup)
- Statistics caching (5-10Ć speedup for rebalancing)
- Vectorization (45-206Ć speedup for selection)
Risk Management:
- Point-in-time eligibility (prevent lookahead bias)
- Membership policy (control turnover)
- Weight constraints (enforce diversification)
- Asset class limits (allocation guardrails)
- Transaction cost modeling
Factor & Signal Features:
- Momentum preselection (top-K by returns)
- Low-volatility preselection (top-K by volatility)
- Combined factor scoring
- Technical indicators (stub - future)
- Macro signals (stub - future)
Auto-generated Diagram & Layout¶
Auto-generated architecture diagram¶
Layered module tree¶
scripts/ # CLI entry points
src/portfolio_management/
āāā core/ # Foundation (exceptions, config, utilities)
āāā data/ # Data management (I/O, ingestion, analysis)
āāā assets/ # Asset universe (selection, classification)
āāā analytics/ # Financial analytics (returns, metrics)
āāā macro/ # Macroeconomic signals & regime gating
āāā portfolio/ # Portfolio construction (strategies, constraints)
āāā backtesting/ # Backtesting engine (simulation, transactions)
āāā reporting/ # Reporting & visualization
Generated via python scripts/generate_arch_diagram.py.
Module Structure¶
src/portfolio_management/
āāā core/ # Exceptions, config, types, utilities
āāā data/ # Ingestion, I/O, matching, analysis
āāā assets/ # Selection, classification, universes
āāā analytics/ # Returns, metrics, indicators
āāā macro/ # Macro signals, regime detection
āāā portfolio/ # Strategies, constraints, membership
āāā backtesting/ # Engine, transactions, performance
āāā reporting/ # Visualization, exporters
Data Flow Patterns¶
Managed Workflow (Recommended):
1. prepare_tradeable_data.py ā tradeable_matches.csv
2. Edit config/universes.yaml
3. manage_universes.py load <universe> ā Auto-pipeline
4. construct_portfolio.py ā weights.csv
5. run_backtest.py ā Results + visualizations
Manual Workflow (Debug/Experiment):
1. prepare_tradeable_data.py
2. select_assets.py
3. classify_assets.py
4. calculate_returns.py
5. construct_portfolio.py
6. run_backtest.py
Technology Stack¶
Core:
- Python 3.12 (minimum 3.10)
- pandas 2.3+, numpy 2.0+, scipy 1.3+
- JAX 0.4+ (for numerical computations)
Portfolio Optimization:
- PyPortfolioOpt 1.5+ (mean-variance)
- riskparityportfolio 0.2+ (risk parity)
- cvxpy 1.1+ (convex optimization)
Performance:
- Polars (optional fast I/O)
- PyArrow (optional fast I/O)
Analytics:
- empyrical-reloaded 0.5+ (performance metrics)
- Plotly 5.0+ (interactive visualization)
Development:
- pytest 8.4+ (testing)
- black 25.9 (formatting)
- ruff 0.14 (linting)
- mypy 1.18 (type checking)
Testing Strategy¶
200+ Tests covering:
- Unit tests for all modules
- Integration tests for pipeline stages
- CLI tests for scripts
- Performance smoke tests
- Edge case handling
- Caching correctness
Test Organization:
tests/
āāā core/ # Core utilities
āāā data/ # Data pipeline
āāā assets/ # Selection & classification
āāā analytics/ # Returns & metrics
āāā portfolio/ # Strategies & constraints
āāā backtesting/ # Engine & performance
āāā reporting/ # Visualization
āāā integration/ # End-to-end tests
āāā scripts/ # CLI tests
Configuration Management¶
Primary Configuration:
config/universes.yaml: Universe definitionsconfig/*.yaml: Strategy-specific configurationspyproject.toml: Project metadata & tool configspytest.ini: Test configurationmypy.ini: Type checking configuration
Runtime Configuration:
- CLI flags for all scripts
- Environment variables for system paths
.cache/: Incremental resume metadata
Error Handling¶
Exception Hierarchy:
PortfolioManagementError (base)
āāā DataError
ā āāā FileNotFoundError
ā āāā DataValidationError
ā āāā DataQualityError
āāā ConfigError
ā āāā ConfigValidationError
ā āāā MissingConfigError
āāā OptimizationError
ā āāā InfeasibleConstraintsError
ā āāā SolverFailureError
āāā BacktestError
āāā InsufficientDataError
āāā RebalanceError
Error Handling Strategy:
- Validate early (fail fast)
- Provide actionable error messages
- Include context (file paths, parameter values)
- Log warnings for non-critical issues
- Raise exceptions for critical failures
Performance Characteristics¶
Data Preparation:
- First run: 3-5 minutes (10,000 files)
- Subsequent runs: 2-3 seconds (with incremental resume)
- Fast I/O: 2-5Ć speedup for large datasets
Asset Selection:
- Vectorized: 45-206Ć faster than iterative
- 10,000 assets: \<1 second
Portfolio Construction:
- Equal Weight: O(n) - instant
- Risk Parity: O(n²) - seconds to minutes
- Mean-Variance: O(n³) - minutes for large universes
- With caching: 5-10Ć speedup for rebalancing
Backtesting:
- 10-year backtest, 50 assets, monthly rebalancing: \<10 seconds
- 10-year backtest, 300 assets, monthly rebalancing: \<60 seconds
- With preselection: 10-20Ć faster for large universes
Memory Management¶
Constraints:
- Repository: 71,379+ files (70,420+ data files)
- All tools configured to exclude
data/directory - Bounded caches with LRU eviction
Memory Optimization:
- Streaming processing for large datasets
- Bounded caches (default 1000 entries)
- 70-90% memory savings vs. unbounded caching
Future Roadmap¶
Stub Features (Infrastructure Complete):
-
Cardinality Constraints
-
MIQP solver integration
- Heuristic approximations
-
Limit portfolio to K positions
-
Technical Indicators
-
TA-Lib integration
- Configurable indicators (RSI, MACD, MA)
-
Signal combination logic
-
Macro Signals
-
Regime detection (recession, risk-off)
- Asset class gating by regime
- Score adjustments
Planned Enhancements:
- Black-Litterman views integration
- News/sentiment factor overlays
- Multi-period optimization
- Risk budgeting constraints
- ESG filtering
Documentation Map¶
Getting Started:
- README.md - Project overview
- QUICKSTART.md - 5-minute setup
- workflow.md - Workflow overview
Module Guides:
- data_preparation.md
- asset_selection.md
- asset_classification.md
- calculate_returns.md
- universes.md
- portfolio_construction.md
- backtesting.md
Advanced Features:
Reference:
Architecture:
- COMPLETE_WORKFLOW.md ā Complete system diagram
Memory Bank (Agent Context)¶
For AI agents working on this project:
- AGENTS.md - Agent operating instructions
- memory-bank/ - Persistent context
projectbrief.md- Project overviewproductContext.md- User needs & use casessystemPatterns.md- Architecture patternstechContext.md- Technical stackactiveContext.md- Current development statusprogress.md- Development history
Contributing¶
When adding new features:
- Follow Module Boundaries: Keep concerns separated
- Add Tests: Maintain >80% coverage
- Update Documentation: Especially COMPLETE_WORKFLOW.md
- Validate Configuration: Add YAML schema validation
- Handle Errors: Use exception hierarchy
- Optimize Performance: Profile before optimizing
- Cache Wisely: Use bounded caches with LRU
Support¶
For questions or issues:
- Check troubleshooting.md
- Review COMPLETE_WORKFLOW.md
- Consult module-specific documentation
- Check test cases for usage examples
Last Updated: October 25, 2025