Universe Management Script: `manage_universes.py`¶

Overview¶

This script is the primary control panel or orchestrator for the "Managed Workflow". Its purpose is to read the universe definitions from the config/universes.yaml file and execute the data processing pipeline on your behalf.

Instead of running the individual scripts (select_assets.py, classify_assets.py, calculate_returns.py) with many arguments, you use this single script with simple commands to achieve the same result in a repeatable and consistent way.

Inputs (Prerequisites)¶

This script requires the universes.yaml file and the outputs from the initial data preparation step:

Universe Config File (Required)
Specified via: --config
Default: config/universes.yaml
Purpose: The blueprint file containing all universe definitions.
Tradeable Matches CSV (Required)
Generated by: scripts/prepare_tradeable_data.py
Specified via: --matches
Default: data/metadata/tradeable_matches.csv
Purpose: The master list of all available assets from which a universe is selected.
Cleaned Prices Directory (Required)
Generated by: scripts/prepare_tradeable_data.py
Specified via: --prices-dir
Default: data/processed/tradeable_prices
Purpose: The data store for the historical price files needed for return calculation.

Script Products¶

The script's products depend on the command used:

load <name> command: This is the main command for generating data. It produces a set of CSV files for the specified universe, saved to the --output-dir. These typically include:
<name>_selected.csv: The list of assets after selection.
<name>_classified.csv: The assets with classification data.
<name>_returns.csv: The final returns matrix, ready for portfolio construction or backtesting.
Other commands (list, show, validate, compare): These commands print their results as formatted text directly to the console.

Features (Commands)¶

This script's features are exposed as a set of commands.

list: Lists the names of all universes defined in the YAML file.
show <name>: Prints the complete configuration for a single, specified universe.
validate <name>: A powerful "dry run" command. It executes the entire data pipeline for a universe in memory (select, classify, calculate) and reports the results and any potential errors without writing any files.
load <name>: The primary execution command. It runs the entire data pipeline for a universe and saves the resulting data files (selected assets, classified assets, returns) to disk.
compare <name1> <name2>: A utility that prints a side-by-side comparison of the definitions for two or more universes.

Usage Examples¶

# List all available universes
python scripts/manage_universes.py list

# Check the configuration of the 'core_global' universe
python scripts/manage_universes.py show core_global

# Validate the 'core_global' universe to see the results without saving files
python scripts/manage_universes.py validate core_global --verbose

# Load the 'core_global' universe, running the full pipeline and saving the data
python scripts/manage_universes.py load core_global --output-dir outputs/core_global

CLI Reference¶

All available commands and flags for scripts/manage_universes.py are documented in the CLI Reference; this guide focuses on the workflow, commands, and expected outputs instead of repeating the argument list.

Universe Management Script: manage_universes.py¶