Mutation Testing¶

This document outlines the process and best practices for mutation testing in this project. Mutation testing is a powerful technique for evaluating the quality of a test suite by introducing small, deliberate changes (mutations) into the source code and checking if the tests can detect them.

Overview¶

The goal of mutation testing is to identify "weak tests" - tests that pass even when the underlying code's behavior has been altered. A high mutation score indicates a robust test suite that is sensitive to changes in the code.

Killed Mutant: A mutation that causes a test to fail. This is the desired outcome.
Survived Mutant: A mutation that does not cause any test to fail. This indicates a weakness in the test suite.

Workflow¶

Due to instability with the mutmut test runner in this project's environment, we use a manual workflow for mutation testing:

Identify a Target Module: Select a critical module for analysis (e.g., a portfolio strategy or a core calculation utility).
Introduce a Manual Mutation: Make a small, plausible change to the source code of the target module. Good candidates for mutation include:
Changing comparison operators (e.g., < to <=).
Altering arithmetic operators (e.g., + to -).
Modifying constants or magic numbers (e.g., 0.05 to 0.95).
Commenting out or deleting lines of code.
Run the Tests: Execute the relevant test suite for the mutated module using pytest.
Analyze the Results:
If the tests fail, the mutant is "killed." This is good! Restore the original code and move on to the next mutation.
If the tests pass, the mutant has "survived." This indicates a gap in the test suite.
Strengthen the Tests: If a mutant survives, analyze why the existing tests did not catch the change. Strengthen the tests by:
Adding more specific assertions.
Creating new test cases for uncovered edge cases.
Improving the quality of test data to trigger the mutated code path.
Verify the Fix: Re-run the tests with the mutated code to ensure that the strengthened test now fails, successfully "killing" the mutant.
Restore the Code: Always restore the original source code before moving on to the next mutation.

Common Weak Test Patterns¶

Our mutation testing efforts have revealed several common patterns of weak tests:

Testing Implementation, Not Behavior: Tests that are too tightly coupled to the implementation details of the code can be brittle and may not catch logic errors.
Insufficient Assertion Specificity: Tests that only check for high-level outcomes (e.g., that a function doesn't crash) may not detect incorrect calculations.
Inadequate Test Data: Tests that use simplistic or unrealistic data may not trigger edge cases or complex code paths.
Relying on Downstream Validation: Tests that depend on validation in other parts of the system to catch errors are not true unit tests and can mask weaknesses in the code under test.

By being mindful of these patterns, we can write more robust and effective tests from the outset.