Skip to content

First Steps Tutorial¶

Learn the basics of the Portfolio Management Toolkit with a hands-on tutorial.

What You'll Build¶

In this tutorial, you'll:

  1. âś… Load and prepare sample data
  2. âś… Create a simple universe of assets
  3. âś… Construct an equal-weight portfolio
  4. âś… Run a backtest
  5. âś… Analyze the results

Time Required: 10-15 minutes

Prerequisites¶

Before starting, ensure you've completed Installation.

Tutorial Steps¶

Step 1: Understand the Workflow¶

The typical workflow is:

graph LR
    A[Raw Data] --> B[Prepare Data]
    B --> C[Select Assets]
    C --> D[Calculate Returns]
    D --> E[Construct Portfolio]
    E --> F[Backtest]
    F --> G[Visualize Results]

Step 2: Prepare Sample Data¶

Preparing quality data is the first critical step—see the full Data Preparation guide for required folder layouts, tradeable instrument formats, and the responsibilities around adjusted Stooq prices.

For this tutorial we keep it fast by copying both the raw Stooq files and the tradeable list from docs/first_steps_bluechip/. The directory contains the curated blue-chip subset plus a few extra tickers so you can see how prepare_tradeable_data.py filters against the tradeable list.

  • Raw Stooq .txt files (columns: TICKER, PER, DATE, TIME, OPEN, HIGH, LOW, CLOSE, VOL, OPENINT) spread across market-specific directories (d_us_txt, d_uk_txt, etc.). Each Stooq file is the source of truth for price history, and Step 2 builds a metadata index, matches every broker symbol against those files, and exports cleaned per‑ticker CSVs together with match/unmatched reports.
  • Tradeable instrument CSV(s) describing the broker universe (symbol, isin, market, name, currency). The script normalizes these rows, matches them to Stooq tickers, and reports flags/quality metrics (data status, flags, currency resolution).

Use the lean blue-chip tradeable list (10 curated assets) so the tutorial stays concise. The bundle is located under docs/first_steps_bluechip/ and includes the ten curated symbols below. The file also contains one intentionally unmatched example (BABA:US) to demonstrate how unmatched entries appear in the report when there is no corresponding Stooq file.

Symbol ISIN Market Description (why it’s a teachable blue-chip exposure)
AGED:LN IE00BYZK4669 GBR-LSE Ageing Population thematic ETF, shows a stable income sleeve.
AGGG:LN IE00B3F81409 GBR-LSE Global aggregate bond ETF, demonstrates fixed income linkages.
CNDX:LN IE00B53SZB19 GBR-LSE Nasdaq 100 ETF (index leg).
ECAR:LN IE00BGL86Z12 GBR-LSE Electric vehicle thematic ETF, adds growth-style exposure.
ECOM:LN IE00BF0M6N54 GBR-LSE E-commerce ETF for diversifying secular trends.
EMAD:LN IE00B466KX20 GBR-LSE Emerging Asia equity ETF to cover regional diversification.
EMRD:LN IE00B469F816 GBR-LSE Broad emerging markets equity ETF (adds EM depth).
IEMB:LN IE00B2NPKV68 GBR-LSE USD emerging markets bond ETF (fixed-income counterpoint).
INXG IE00B1FZSD53 LSE Index-linked gilt ETF (local rate exposure).
SGLD IE00B579F325 LSE Physical-gold ETC for defensive balance.

Copy the mini bundle into your working tree:

mkdir -p data/stooq tradeable_instruments
cp -r docs/first_steps_bluechip/stooq/* data/stooq/
cp docs/first_steps_bluechip/tradeable_blue_chips.csv tradeable_instruments/

Now verify the directories:

ls data/stooq/
ls tradeable_instruments/

Prepare the data to generate the match/unmatched reports before moving on:

python3.12 scripts/prepare_tradeable_data.py \
  --data-dir data/stooq \
  --metadata-output data/metadata/stooq_index.csv \
  --tradeable-dir tradeable_instruments \
  --match-report data/metadata/tradeable_matches.csv \
  --unmatched-report data/metadata/tradeable_unmatched.csv \
  --prices-output data/processed/tradeable_prices \
  --incremental

wc -l data/metadata/tradeable_matches.csv
wc -l data/metadata/tradeable_unmatched.csv
head -n 5 data/metadata/tradeable_unmatched.csv

Desired result: the match report contains the 10 curated blue-chip ETFs/ETCs, the unmatched report includes the manually added BABA:US (since no matching Stooq file was copied), and you get one exported CSV per matched ticker under data/processed/tradeable_prices/. This demonstrates how the tradeable instrument list controls which raw files survive in the pipeline.

The Stooq folder also includes tickers that are not listed in docs/first_steps_bluechip/tradeable_blue_chips.csv (e.g., QQQ5.UK, SPY4.UK, TSLA.US, AAPL.US). When you run scripts/prepare_tradeable_data.py, only the ten curated symbols produce match report rows and exported CSV files. The extra raw files do not appear in the reports at all (because they are not in the tradeable list); the unmatched report lists only tradeable rows that failed to map to a Stooq file (here, BABA:US).

What does --incremental do? It enables a fast resume: if the input directories and index CSV haven’t changed and both the match and unmatched reports already exist, the script skips reprocessing and exits quickly. Use it to speed up iteration while editing the tradeable list or when rerunning the tutorial.

Step 3: Load a Pre-configured Universe¶

Run the managed pipeline (select → classify → calculate returns) using a small tutorial universe defined in config/universes_tutorial.yaml.

References: Universe Guide · YAML Reference · Management CLI

What is an asset universe?

  • A universe is a named, reusable blueprint that encodes three things: how we select assets, how we filter them after classification, and how we compute/align returns. It turns ad‑hoc flags into a declarative config you can validate, load, and compare.
  • Step 2 prepared data (built matches, exported cleaned per‑ticker prices). Step 3 applies explicit rules to that pool to curate a working set and produces three artifacts: <name>_assets.csv, <name>_classifications.csv, and <name>_returns.csv for downstream portfolio and backtesting steps.
  • Think of it as “codifying the selection + return prep you want to reuse”, not just another one‑off filter. See the Universe Guide for full details and advanced options.

This tutorial uses a ready-made universe named tutorial_bluechip that matches the Step 2 fixtures. In short:

  • filter_criteria: only “ok” data, ≥3y history, LSE/GBR-LSE markets, ETF categories in lse etfs/1 and lse etfs/2
  • classification_requirements: keep equity, fixed income, and commodity sleeves (broad and demo-friendly)
  • return_config: simple monthly returns, outer alignment, forward-fill small gaps
  • constraints: expect ~10 assets (min 8, max 12)

Minimal snippet (full file: config/universes_tutorial.yaml):

universes:
  tutorial_bluechip:
    description: "First Steps tutorial: curated LSE blue-chips (10 assets)"
    filter_criteria:
      data_status: ["ok"]
      min_history_days: 756
      min_price_rows: 756
      markets: ["GBR-LSE", "LSE"]
      currencies: ["GBP", "USD"]
      categories: ["lse etfs/1", "lse etfs/2"]
    classification_requirements:
      asset_class: ["equity", "fixed_income", "commodity"]
    return_config:
      method: "simple"
      frequency: "monthly"
      handle_missing: "forward_fill"
    constraints:
      min_assets: 8
      max_assets: 12

For detailed semantics of each block, see the Universe Guide and YAML Reference linked above.

Prerequisite: tradeable matches and cleaned prices exist. If you used Step 2 fixtures, build them now:

python scripts/prepare_tradeable_data.py \
    --data-dir data/stooq \
    --tradeable-dir tradeable_instruments \
    --metadata-output data/metadata/stooq_index.csv \
    --match-report data/metadata/tradeable_matches.csv \
    --unmatched-report data/metadata/tradeable_unmatched.csv \
    --prices-output data/processed/tradeable_prices \
    --incremental

Inspect or validate:

# List available universes from the tutorial file
python scripts/manage_universes.py list --config config/universes_tutorial.yaml

# Show the full definition of the tutorial universe
python scripts/manage_universes.py show tutorial_bluechip --config config/universes_tutorial.yaml

# Validate the pipeline in-memory (no files written)
python scripts/manage_universes.py validate tutorial_bluechip --config config/universes_tutorial.yaml --verbose

Load the universe and export artifacts for tutorial_bluechip:

python scripts/manage_universes.py load tutorial_bluechip \
  --config config/universes_tutorial.yaml \
  --output-dir outputs/tutorial

Expected files (first lines):

head -n 5 outputs/tutorial/tutorial_bluechip_assets.csv

Example (first five lines of outputs/tutorial/tutorial_bluechip_assets.csv):

symbol,isin,name,market,region,currency,category,price_start,price_end,price_rows,data_status,data_flags,stooq_path,resolved_currency,currency_status
INXG,IE00B1FZSD53,GBP,LSE,uk,,lse etfs/1,2015-03-04,2025-10-03,2676,ok,,d_uk_txt/data/daily/uk/lse etfs/1/inxg.uk.txt,GBP,inferred_only
SPY4,IE00B4YBJ215,USD,LSE,uk,,lse etfs/2,2015-03-04,2025-10-03,2679,ok,,d_uk_txt/data/daily/uk/lse etfs/2/spy4.uk.txt,GBP,inferred_only
SGLD,IE00B579F325,Invesco Physical Gold ETC USD,LSE,uk,,lse etfs/2,2015-03-30,2025-10-03,2659,ok,,d_uk_txt/data/daily/uk/lse etfs/2/sgld.uk.txt,GBP,inferred_only
QQQ5,XS2399364152,USD,LSE,uk,,lse etfs/2,2021-12-17,2025-10-03,956,ok,,d_uk_txt/data/daily/uk/lse etfs/2/qqq5.uk.txt,GBP,inferred_only

Use this snippet to verify your head output looks the same (the tutorial assets and metadata should match the Step 2 fixtures). The QQQ5 row shows how the original fixture bundle stays in the universe even though the subsequent return calculation may drop it for coverage.

Step 4: Construct an Equal-Weight Portfolio¶

Build a simple baseline using the equal_weight strategy. It splits 1/N across all assets in your returns file (no optimization), and enforces basic guardrails like the max per-asset weight (default 25%). This makes it fast, transparent, and a great first check before trying advanced strategies.

Run it:

python scripts/construct_portfolio.py \
  --returns outputs/tutorial/tutorial_bluechip_returns.csv \
  --strategy equal_weight \
  --output outputs/tutorial/portfolio_weights.csv

What you get:

  • outputs/tutorial/portfolio_weights.csv with two columns: ticker,weight
  • Log lines confirming the number of holdings and the top positions
  • Weights that sum to 1.0; each position is approximately 1/N
  • Sample head (six evenly weighted assets; your weights should stay near 0.167):
ticker,weight
INXG,0.16666666666666666
SPY4,0.16666666666666666
SPY4:LN,0.16666666666666666
SGLD,0.16666666666666666
QQQ5,0.16666666666666666
IEMB:LN,0.16666666666666666

Notes:

  • If your universe is very small, 1/N could exceed the default 0.25 cap; use a few more assets or adjust caps later when you explore constraints.
  • Classification-based exposure limits are optional and covered in the Portfolio Construction guide; you don’t need them for this step.

Learn more: Portfolio Construction · CLI Reference

Step 5: Build Prices/Returns and Run a Backtest¶

You’ll now create the wide prices.csv and returns.csv needed for backtesting, verify they line up with your Step 4 weights, and then run a monthly equal‑weight backtest for 2020–2023.

References: Backtesting Guide · CLI Reference

Prepare wide CSVs (date index, tickers as columns):

# Optional: deduplicate the assets list for clean column names
awk -F, 'NR==1 || !seen[$1]++' \
  outputs/tutorial/tutorial_bluechip_assets.csv \
  > outputs/tutorial/tutorial_bluechip_assets_dedup.csv

# Assemble wide prices
python scripts/assemble_prices.py \
  --assets outputs/tutorial/tutorial_bluechip_assets_dedup.csv \
  --prices-dir data/processed/tradeable_prices \
  --output outputs/tutorial/prices.csv

# Calculate wide daily returns, keep lower-coverage tickers, and print a summary
python scripts/calculate_returns.py \
  --assets outputs/tutorial/tutorial_bluechip_assets.csv \
  --prices-dir data/processed/tradeable_prices \
  --frequency daily \
  --output outputs/tutorial/returns.csv \
  --summary \
  --min-coverage 0.2

Quick check: confirm Step 4 weights align with Step 5 returns

python - << 'PY'
import pandas as pd
w = pd.read_csv('outputs/tutorial/portfolio_weights.csv', index_col=0)['weight']
r = pd.read_csv('outputs/tutorial/returns.csv', nrows=0)
tickers = set(r.columns) - {'date'}
print('Weights sum:', round(float(w.sum()), 6))
print('Missing from returns:', sorted(set(w.index) - tickers))
PY

If you adjust --min-coverage or otherwise change which tickers survive, rerun Step 4 so the equal-weight portfolio reflects the current asset list.

Expected heads (compare your first lines):

# outputs/tutorial/prices.csv
date,INXG,SPY4,SGLD,QQQ5,SPY4:LN,IEMB:LN
2015-03-04,13.441,41.86,,,41.86,86.7302
2015-03-05,13.397,41.8,,,41.8,86.8005
2015-03-06,13.349,41.44,,,41.44,86.1938
2015-03-09,13.335,41.47,,,41.47,85.8612
# outputs/tutorial/returns.csv
date,INXG,SPY4,SPY4:LN,SGLD,QQQ5,IEMB:LN
2015-03-05,-0.003273565954914126,-0.0014333492594362784,-0.0014333492594362784,,,0.000810559643584341
2015-03-06,-0.003582891692169854,-0.008612440191387516,-0.008612440191387516,,,-0.006989591073784163
2015-03-09,-0.001048767697954811,0.0007239382239383474,0.0007239382239383474,,,-0.003858746220725795
2015-03-10,0.013048368953880729,-0.010127803231251509,-0.010127803231251509,,,-0.004472334418806123

Create a minimal universe YAML listing only the tickers you kept so the backtest matches your CSVs:

python - <<'PY'
import pandas as pd
import yaml

symbols = (
    pd.read_csv('outputs/tutorial/tutorial_bluechip_assets.csv')
    ['symbol']
    .drop_duplicates()
    .tolist()
)

with open('outputs/tutorial/tutorial_bluechip_universe.yaml', 'w') as fh:
    yaml.safe_dump(
        {'universes': {'tutorial_bluechip': {'assets': symbols}}},
        fh,
        sort_keys=False,
    )
PY

Run the backtest (equal‑weight, monthly):

python scripts/run_backtest.py equal_weight \
  --universe-file outputs/tutorial/tutorial_bluechip_universe.yaml \
  --universe-name tutorial_bluechip \
  --start-date 2020-01-02 \
  --end-date 2023-12-29 \
  --rebalance-frequency monthly \
  --prices-file outputs/tutorial/prices.csv \
  --returns-file outputs/tutorial/returns.csv \
  --output-dir outputs/tutorial/backtest

Expected output:

  • outputs/tutorial/backtest/config.json
  • outputs/tutorial/backtest/equity_curve.csv
  • outputs/tutorial/backtest/summary_report.json
  • Optional: outputs/tutorial/backtest/trades.csv (enable with --save-trades)

Need a single command to reproduce Steps 2-7? Run examples/first_steps_workflow.py (it copies the fixtures, executes each helper, and saves the entire workspace under outputs/tutorial-first-steps).

Step 6: View the Results¶

Check the backtest artifacts you just produced and compare them against the sample data below so you can confirm your output matched the tutorial workspace.

References: Backtesting Guide · CLI Reference

Inspect the equity curve (dates + equity level):

head -n 5 outputs/tutorial/backtest/equity_curve.csv

Expected head:

date,equity
2020-01-02,100000.0
2020-01-03,100242.71
2020-01-06,100244.477
2020-01-07,100288.134
2020-01-08,100234.83

Open the summary report to verify key performance/risk statistics:

cat outputs/tutorial/backtest/summary_report.json

Sample summary:

{
  "performance": {
    "total_return_pct": -3.83,
    "annualized_return_pct": -0.97,
    "volatility_pct": 15.28,
    "sharpe_ratio": -0.06,
    "sortino_ratio": -0.08,
    "max_drawdown_pct": -40.02,
    "calmar_ratio": -0.02
  },
  "risk": {
    "expected_shortfall_95_pct": -1.63,
    "win_rate_pct": 53.2,
    "avg_win_pct": 0.62,
    "avg_loss_pct": -0.7
  },
  "trading": {
    "num_rebalances": 48,
    "total_costs_usd": 976.71
  },
  "portfolio": {
    "initial_value": 100000.0,
    "final_value": 96172.67
  }
}

Typical metrics include total return, volatility, turnover, and max drawdown. Use these numbers to sanity-check the log output before digging into deeper analysis.

Step 7: Visualize Results (Optional)¶

Use the built-in chart helper to turn the CSV diagnostics into PNGs:

python scripts/plot_charts.py outputs/tutorial/backtest

The script reads the viz_equity_curve.csv / viz_drawdown.csv files and writes equity_curve.png and drawdown.png inside outputs/tutorial/backtest. Re-run this command whenever you re-run Step 5 to keep the charts fresh.

Quick sanity check:

head -n 5 outputs/tutorial/backtest/viz_equity_curve.csv

You should see columns such as equity_normalized and equity_change_pct so you can inspect momentum manually if needed. If you need the visuals offline, open outputs/tutorial/backtest/equity_curve.png and outputs/tutorial/backtest/drawdown.png.

Example workflow script¶

Want to reproduce the entire tutorial with one command? Run:

python examples/first_steps_workflow.py \
  --workspace outputs/tutorial-first-steps

This wrapper copies the fixtures, reruns Steps 2–7, and writes everything under the workspace you choose (outputs/tutorial-first-steps by default). The resulting backtest/ folder contains the CSVs listed above plus equity_curve.png, drawdown.png, and viz_*.csv.

What You've Learned¶

âś… How to load a pre-configured universe âś… How to construct a simple portfolio âś… How to run a backtest with transaction costs âś… How to interpret performance metrics

Where to Go Next¶

Keep building from here:

Common Questions¶

Q: Where is the data stored?¶

All data is stored locally:

  • Raw data: data/stooq/ (CSV files from Stooq)
  • Processed data: data/processed/ (cleaned price files)
  • Metadata: data/metadata/ (indices and match reports)
  • Outputs: outputs/ (backtest results, charts)

Q: How do I update data?¶

Re-run the data preparation step with --incremental:

python scripts/prepare_tradeable_data.py \
    --data-dir data/stooq \
    --incremental

The incremental flag skips unchanged files, reducing runtime from minutes to seconds.

Q: Can I use my own data?¶

Yes! The toolkit accepts any CSV files with:

  • Date column (YYYY-MM-DD format)
  • Close price column
  • Ticker identifier

See Data Preparation Guide for details.

Q: How do I customize transaction costs?¶

Use the --commission-rate flag:

python scripts/run_backtest.py \
    --universe-file config/universes_tutorial.yaml \
    --universe-name tutorial_bluechip \
    --commission-rate 0.001  # 0.1% per trade
    ...

Q: Where can I get help?¶

Tutorial Complete! 🎉¶

You've successfully:

  • âś… Loaded a universe
  • âś… Constructed a portfolio
  • âś… Run a backtest
  • âś… Analyzed results

Next: Explore the Workflow Guide to learn advanced techniques.