Universe Management Script: manage_universes.py¶
Overview¶
This script is the primary control panel or orchestrator for the "Managed Workflow". Its purpose is to read the universe definitions from the config/universes.yaml file and execute the data processing pipeline on your behalf.
Instead of running the individual scripts (select_assets.py, classify_assets.py, calculate_returns.py) with many arguments, you use this single script with simple commands to achieve the same result in a repeatable and consistent way.
Inputs (Prerequisites)¶
This script requires the universes.yaml file and the outputs from the initial data preparation step:
-
Universe Config File (Required)
-
Specified via:
--config - Default:
config/universes.yaml -
Purpose: The blueprint file containing all universe definitions.
-
Tradeable Matches CSV (Required)
-
Generated by:
scripts/prepare_tradeable_data.py - Specified via:
--matches - Default:
data/metadata/tradeable_matches.csv -
Purpose: The master list of all available assets from which a universe is selected.
-
Cleaned Prices Directory (Required)
-
Generated by:
scripts/prepare_tradeable_data.py - Specified via:
--prices-dir - Default:
data/processed/tradeable_prices - Purpose: The data store for the historical price files needed for return calculation.
Script Products¶
The script's products depend on the command used:
load <name>command: This is the main command for generating data. It produces a set of CSV files for the specified universe, saved to the--output-dir. These typically include:<name>_selected.csv: The list of assets after selection.<name>_classified.csv: The assets with classification data.<name>_returns.csv: The final returns matrix, ready for portfolio construction or backtesting.- Other commands (
list,show,validate,compare): These commands print their results as formatted text directly to the console.
Features (Commands)¶
This script's features are exposed as a set of commands.
list: Lists the names of all universes defined in the YAML file.show <name>: Prints the complete configuration for a single, specified universe.validate <name>: A powerful "dry run" command. It executes the entire data pipeline for a universe in memory (select, classify, calculate) and reports the results and any potential errors without writing any files.load <name>: The primary execution command. It runs the entire data pipeline for a universe and saves the resulting data files (selected assets, classified assets, returns) to disk.compare <name1> <name2>: A utility that prints a side-by-side comparison of the definitions for two or more universes.
Usage Examples¶
# List all available universes
python scripts/manage_universes.py list
# Check the configuration of the 'core_global' universe
python scripts/manage_universes.py show core_global
# Validate the 'core_global' universe to see the results without saving files
python scripts/manage_universes.py validate core_global --verbose
# Load the 'core_global' universe, running the full pipeline and saving the data
python scripts/manage_universes.py load core_global --output-dir outputs/core_global
CLI Reference¶
All available commands and flags for scripts/manage_universes.py are documented in the CLI Reference; this guide focuses on the workflow, commands, and expected outputs instead of repeating the argument list.