Skip to content

Universe Management Script: manage_universes.py

Overview

This script is the primary control panel or orchestrator for the "Managed Workflow". Its purpose is to read the universe definitions from the config/universes.yaml file and execute the data processing pipeline on your behalf.

Instead of running the individual scripts (select_assets.py, classify_assets.py, calculate_returns.py) with many arguments, you use this single script with simple commands to achieve the same result in a repeatable and consistent way.

Inputs (Prerequisites)

This script requires the universes.yaml file and the outputs from the initial data preparation step:

  1. Universe Config File (Required)

  2. Specified via: --config

  3. Default: config/universes.yaml
  4. Purpose: The blueprint file containing all universe definitions.

  5. Tradeable Matches CSV (Required)

  6. Generated by: scripts/prepare_tradeable_data.py

  7. Specified via: --matches
  8. Default: data/metadata/tradeable_matches.csv
  9. Purpose: The master list of all available assets from which a universe is selected.

  10. Cleaned Prices Directory (Required)

  11. Generated by: scripts/prepare_tradeable_data.py

  12. Specified via: --prices-dir
  13. Default: data/processed/tradeable_prices
  14. Purpose: The data store for the historical price files needed for return calculation.

Script Products

The script's products depend on the command used:

  • load <name> command: This is the main command for generating data. It produces a set of CSV files for the specified universe, saved to the --output-dir. These typically include:
  • <name>_selected.csv: The list of assets after selection.
  • <name>_classified.csv: The assets with classification data.
  • <name>_returns.csv: The final returns matrix, ready for portfolio construction or backtesting.
  • Other commands (list, show, validate, compare): These commands print their results as formatted text directly to the console.

Features (Commands)

This script's features are exposed as a set of commands.

  • list: Lists the names of all universes defined in the YAML file.
  • show <name>: Prints the complete configuration for a single, specified universe.
  • validate <name>: A powerful "dry run" command. It executes the entire data pipeline for a universe in memory (select, classify, calculate) and reports the results and any potential errors without writing any files.
  • load <name>: The primary execution command. It runs the entire data pipeline for a universe and saves the resulting data files (selected assets, classified assets, returns) to disk.
  • compare <name1> <name2>: A utility that prints a side-by-side comparison of the definitions for two or more universes.

Usage Examples

# List all available universes
python scripts/manage_universes.py list

# Check the configuration of the 'core_global' universe
python scripts/manage_universes.py show core_global

# Validate the 'core_global' universe to see the results without saving files
python scripts/manage_universes.py validate core_global --verbose

# Load the 'core_global' universe, running the full pipeline and saving the data
python scripts/manage_universes.py load core_global --output-dir outputs/core_global

CLI Reference

All available commands and flags for scripts/manage_universes.py are documented in the CLI Reference; this guide focuses on the workflow, commands, and expected outputs instead of repeating the argument list.