Integration Testing Guidelines¶
This document provides guidelines for writing effective integration tests that validate multiple components working together.
What is an Integration Test?¶
Integration tests validate that multiple components work correctly together as a system. They differ from unit tests in that they:
- Test interactions between components
- Use real or realistic data
- May involve actual I/O operations
- Take longer to execute (seconds to minutes)
- Validate end-to-end workflows
When to Write Integration Tests¶
Write integration tests when you need to:
- Validate component interactions: Test that modules communicate correctly
- Test end-to-end workflows: Verify complete user scenarios work
- Catch integration bugs: Find issues that unit tests miss
- Test with real data: Validate behavior with production-like data
- Verify system contracts: Ensure APIs between components are stable
Structure¶
Test Organization¶
Integration tests are in tests/integration/:
tests/integration/
├── test_workflow.py # End-to-end workflow tests
├── test_backtest_integration.py # Backtest component integration
├── test_portfolio_integration.py # Portfolio construction integration
├── test_caching_edge_cases.py # Cache system integration
└── test_long_history_comprehensive.py # Long-term validation tests
Test File Template¶
"""Integration tests for <feature/workflow>.
Tests:
- Workflow 1: Description
- Workflow 2: Description
- Edge case: Description
"""
import pytest
from pathlib import Path
@pytest.mark.integration
class Test<Feature>Integration:
"""Integration tests for <feature>."""
def test_<workflow>_end_to_end(self, tmp_path):
"""Verify <workflow> works end-to-end."""
# Setup: Create realistic test environment
data = create_realistic_data()
config = create_config()
# Execute: Run complete workflow
result = execute_workflow(data, config)
# Validate: Check key outcomes
assert result.success
assert len(result.outputs) > 0
Writing Effective Integration Tests¶
Test Complete Workflows¶
Integration tests should cover complete user scenarios:
@pytest.mark.integration
def test_full_backtest_workflow(tmp_path):
"""Test complete backtest from data to results."""
# 1. Load data
prices = load_prices("tests/fixtures/prices.csv")
returns = calculate_returns(prices)
# 2. Configure strategy
config = BacktestConfig(
start_date=date(2020, 1, 1),
end_date=date(2023, 12, 31),
rebalance_frequency=RebalanceFrequency.MONTHLY,
initial_capital=Decimal(100000)
)
# 3. Setup components
preselection = Preselection(PreselectionConfig(top_k=30))
strategy = EqualWeightStrategy()
# 4. Run backtest
engine = BacktestEngine(config, strategy, prices, returns)
result = engine.run()
# 5. Validate results
assert result.total_return is not None
assert len(result.rebalance_events) > 0
assert result.sharpe_ratio > 0
# 6. Generate report
report_file = tmp_path / "backtest_report.html"
generate_report(result, report_file)
assert report_file.exists()
Test Component Interactions¶
Verify that components communicate correctly:
@pytest.mark.integration
def test_preselection_with_membership_policy():
"""Verify preselection and membership policy work together."""
# Setup components
preselection_config = PreselectionConfig(
method=PreselectionMethod.MOMENTUM,
top_k=30,
lookback=252
)
membership_policy = MembershipPolicy(
buffer_rank=40,
min_holding_periods=2,
max_turnover=0.30
)
# Generate test data
returns = create_test_returns(n_assets=100, n_periods=1000)
# First rebalance - no current holdings
date1 = returns.index[500].date()
selected1 = preselection.select_assets(returns[:500], date1)
holdings1 = apply_membership_policy(
current_holdings=[],
preselected_ranks={asset: i for i, asset in enumerate(selected1)},
policy=membership_policy,
top_k=30
)
# Second rebalance - with current holdings
date2 = returns.index[600].date()
selected2 = preselection.select_assets(returns[:600], date2)
holdings2 = apply_membership_policy(
current_holdings=holdings1,
preselected_ranks={asset: i for i, asset in enumerate(selected2)},
policy=membership_policy,
top_k=30,
holding_periods={asset: 1 for asset in holdings1}
)
# Validate interactions
assert len(holdings1) <= 30
assert len(holdings2) <= 30
# Membership policy should limit turnover
added = set(holdings2) - set(holdings1)
removed = set(holdings1) - set(holdings2)
turnover = (len(added) + len(removed)) / len(holdings1)
assert turnover <= membership_policy.max_turnover
Test with Realistic Data¶
Use production-like data for validation:
@pytest.mark.integration
def test_backtest_with_real_data_patterns():
"""Test backtest with realistic data patterns."""
# Create data with realistic characteristics
returns = generate_realistic_returns(
n_assets=100,
n_periods=1000,
mean_return=0.0003, # 7.5% annualized
volatility=0.01, # 16% annualized
correlation=0.3 # Moderate correlation
)
# Add realistic gaps and missing data
returns = add_random_gaps(returns, gap_probability=0.02)
# Run backtest
config = BacktestConfig(
start_date=returns.index[0].date(),
end_date=returns.index[-1].date(),
rebalance_frequency=RebalanceFrequency.MONTHLY,
use_pit_eligibility=True,
min_history_days=252
)
strategy = RiskParityStrategy()
prices = returns_to_prices(returns, initial_price=100)
engine = BacktestEngine(config, strategy, prices, returns)
result = engine.run()
# Validate realistic outcomes
assert -0.5 < result.total_return < 2.0 # Reasonable range
assert 0 < result.sharpe_ratio < 3.0 # Reasonable Sharpe
assert result.max_drawdown < 0.6 # Not catastrophic
Test Error Handling¶
Integration tests should verify error handling across components:
@pytest.mark.integration
def test_backtest_handles_insufficient_data_gracefully():
"""Verify backtest handles data issues appropriately."""
# Create dataset with insufficient history
returns = create_test_returns(n_assets=50, n_periods=100)
prices = returns_to_prices(returns, initial_price=100)
# Configure backtest with requirements that can't be met
config = BacktestConfig(
start_date=returns.index[0].date(),
end_date=returns.index[-1].date(),
use_pit_eligibility=True,
min_history_days=365, # More than available
)
strategy = EqualWeightStrategy()
engine = BacktestEngine(config, strategy, prices, returns)
# Should raise appropriate error
with pytest.raises(InsufficientHistoryError):
engine.run()
Using Fixtures¶
Integration tests often need complex setup. Use fixtures:
@pytest.fixture
def realistic_market_data():
"""Create realistic market data for integration tests."""
# Load or generate comprehensive test data
dates = pd.date_range("2015-01-01", "2023-12-31", freq="D")
assets = [f"ASSET_{i:03d}" for i in range(200)]
# Generate with realistic properties
returns = generate_correlated_returns(
dates=dates,
assets=assets,
mean=0.0003,
vol=0.01,
correlation_matrix=create_block_correlation_matrix(200, 0.3)
)
return {
"returns": returns,
"prices": returns_to_prices(returns, 100),
"dates": dates,
"assets": assets
}
@pytest.fixture
def backtest_config():
"""Create standard backtest configuration."""
return BacktestConfig(
start_date=date(2020, 1, 1),
end_date=date(2023, 12, 31),
rebalance_frequency=RebalanceFrequency.MONTHLY,
initial_capital=Decimal(100000),
use_pit_eligibility=True,
min_history_days=252
)
@pytest.mark.integration
def test_with_fixtures(realistic_market_data, backtest_config):
"""Test using integration fixtures."""
strategy = EqualWeightStrategy()
engine = BacktestEngine(
backtest_config,
strategy,
realistic_market_data["prices"],
realistic_market_data["returns"]
)
result = engine.run()
assert result.total_return is not None
Testing Across Multiple Components¶
Data Pipeline Integration¶
@pytest.mark.integration
def test_data_pipeline_end_to_end(tmp_path):
"""Test complete data pipeline from raw to processed."""
# 1. Setup: Create raw data files
raw_dir = tmp_path / "raw"
raw_dir.mkdir()
create_raw_stooq_files(raw_dir, n_assets=50)
# 2. Index building
index_file = tmp_path / "index.csv"
build_index(raw_dir, index_file)
assert index_file.exists()
# 3. Matching with brokers
broker_files = create_broker_files(tmp_path)
matches = match_tickers(index_file, broker_files)
assert len(matches) > 0
# 4. Data validation
diagnostics = validate_data(raw_dir, matches)
assert diagnostics["total_files"] == 50
# 5. Export processed data
processed_dir = tmp_path / "processed"
export_processed_data(raw_dir, matches, processed_dir)
assert (processed_dir / "prices.csv").exists()
assert (processed_dir / "returns.csv").exists()
Strategy Pipeline Integration¶
@pytest.mark.integration
def test_strategy_pipeline_with_all_features():
"""Test strategy with preselection, membership, and optimization."""
# Setup data
returns = create_test_returns(n_assets=200, n_periods=2000)
prices = returns_to_prices(returns, 100)
# Configure all components
preselection_config = PreselectionConfig(
method=PreselectionMethod.COMBINED,
top_k=50,
momentum_weight=0.6,
low_vol_weight=0.4
)
membership_policy = MembershipPolicy(
buffer_rank=60,
min_holding_periods=3,
max_turnover=0.25
)
strategy = MeanVarianceStrategy(
risk_aversion=1.0,
weight_bounds=(0.01, 0.10)
)
config = BacktestConfig(
start_date=returns.index[500].date(),
end_date=returns.index[-1].date(),
rebalance_frequency=RebalanceFrequency.QUARTERLY,
use_pit_eligibility=True,
min_history_days=252
)
# Execute full pipeline
cache = FactorCache(Path(".cache/test"), enabled=True)
preselection = Preselection(preselection_config, cache=cache)
engine = BacktestEngine(config, strategy, prices, returns)
engine.set_preselection(preselection)
engine.set_membership_policy(membership_policy)
result = engine.run()
# Validate all features worked
assert len(result.rebalance_events) > 0
assert result.total_return is not None
# Verify preselection was used
for event in result.rebalance_events:
assert len(event.new_weights) <= 50
# Verify membership constraints
cache.clear()
Performance Validation¶
Integration tests can validate performance requirements:
@pytest.mark.integration
def test_backtest_performance_acceptable():
"""Verify backtest completes in reasonable time."""
import time
# Setup realistic scenario
returns = create_test_returns(n_assets=100, n_periods=2500) # 10 years daily
prices = returns_to_prices(returns, 100)
config = BacktestConfig(
start_date=returns.index[0].date(),
end_date=returns.index[-1].date(),
rebalance_frequency=RebalanceFrequency.MONTHLY, # ~120 rebalances
)
strategy = EqualWeightStrategy()
engine = BacktestEngine(config, strategy, prices, returns)
# Measure execution time
start = time.time()
result = engine.run()
duration = time.time() - start
# Verify reasonable performance
assert duration < 30, f"Backtest took {duration:.1f}s (expected <30s)"
assert result is not None
@pytest.mark.integration
def test_caching_improves_performance():
"""Verify caching provides performance benefit."""
import time
returns = create_test_returns(n_assets=200, n_periods=1000)
config = PreselectionConfig(top_k=50, lookback=252)
# First run without cache
preselection_no_cache = Preselection(config, cache=None)
start = time.time()
for i in range(100, 900, 100):
date = returns.index[i].date()
preselection_no_cache.select_assets(returns[:i], date)
time_no_cache = time.time() - start
# Second run with cache
cache = FactorCache(Path(".cache/test"), enabled=True)
cache.clear()
preselection_with_cache = Preselection(config, cache=cache)
start = time.time()
for i in range(100, 900, 100):
date = returns.index[i].date()
preselection_with_cache.select_assets(returns[:i], date)
time_with_cache = time.time() - start
# Verify significant speedup on repeated dates
cache.clear()
start = time.time()
for i in range(100, 900, 100):
date = returns.index[i].date()
preselection_with_cache.select_assets(returns[:i], date)
time_cached_repeat = time.time() - start
# Cached run should be much faster
assert time_cached_repeat < time_no_cache * 0.5
Testing Data Integrity¶
Verify data flows correctly through the system:
@pytest.mark.integration
def test_data_integrity_through_pipeline():
"""Verify data maintains integrity through processing pipeline."""
# Create known test data
initial_prices = pd.DataFrame({
"ASSET_001": [100, 105, 110, 108, 112],
"ASSET_002": [50, 52, 51, 53, 54]
}, index=pd.date_range("2023-01-01", periods=5, freq="D"))
# Calculate returns
returns = calculate_returns(initial_prices)
# Reconstruct prices from returns
reconstructed = returns_to_prices(returns, initial_prices.iloc[0])
# Verify integrity
pd.testing.assert_frame_equal(
initial_prices.iloc[1:],
reconstructed,
atol=0.01
)
Best Practices¶
DO¶
✅ Test realistic scenarios: Use production-like data and configurations
✅ Test component boundaries: Verify interfaces between modules
✅ Test error propagation: Ensure errors flow correctly through system
✅ Use appropriate timeouts: Set reasonable time limits for long tests
✅ Clean up resources: Remove temporary files, close connections
✅ Mark tests clearly: Use @pytest.mark.integration
DON'T¶
❌ Test everything together: Keep tests focused on specific integration points
❌ Skip error cases: Integration tests should cover error scenarios
❌ Ignore performance: Monitor and assert on execution time
❌ Leave artifacts: Clean up temporary files and state
❌ Depend on test order: Each test should be independent
Running Integration Tests¶
# Run all integration tests
pytest tests/integration/ -v
# Run specific integration test file
pytest tests/integration/test_backtest_integration.py -v
# Run integration tests excluding slow ones
pytest tests/integration/ -m "integration and not slow" -v
# Run with detailed output
pytest tests/integration/ -v -s
# Run in parallel (if tests are independent)
pytest tests/integration/ -n 4
# Run with coverage
pytest tests/integration/ --cov=src/portfolio_management
Troubleshooting Integration Tests¶
Common Issues¶
-
Tests fail due to missing data:
-
Ensure fixtures are properly set up
- Check that test data files exist
-
Verify file paths are correct
-
Tests are too slow:
-
Reduce data size if possible
- Mark as
@pytest.mark.slowif necessary -
Consider mocking expensive operations
-
Tests fail intermittently:
-
Check for race conditions
- Ensure proper cleanup between tests
-
Verify no global state leakage
-
Resource leaks:
-
Use context managers for file operations
- Clean up in
finallyblocks or fixtures - Check for open connections
Related Documentation¶
- Test Strategy - Overall testing philosophy
- Unit Testing - Unit test guidelines
- Long History Tests - Long-term validation
- Testing Overview - Test organization
- Troubleshooting Guide - Debugging help