Fix flaky hypothesis tests in test_job.py #689

mfinean · 2025-12-26T23:42:29Z

Summary

Increased hypothesis test deadlines from 500-2000ms to 5000ms for all tests in test_job.py
The multiprocessing executor has significant startup overhead that was causing intermittent deadline failures

Summary by Sourcery

Introduce a structured benchmark configuration system with composable sub-config classes and update tests and configuration access accordingly.

Bug Fixes:

Relax Hypothesis deadlines in job tests to avoid flaky failures caused by multiprocessing startup overhead.

Enhancements:

Add BenchRunCfg as a composable run-configuration that delegates parameters to sub-config objects while preserving flat, backward-compatible access.
Add BenchCfg as a higher-level benchmark configuration encapsulating variables, metadata, hashing, and sweep description utilities.
Provide dedicated cache, execution, display, visualization, time, server, and dims configuration classes to better organize benchmark settings.
Refactor the bench_cfg package into a proper subpackage with explicit exports and improve LaTeX/markdown description helpers for sweeps and results.

Tests:

Add comprehensive tests for sub-config defaults, BenchRunCfg composition/delegation, BenchCfg behavior, and backward-compatible imports.

sourcery-ai · 2025-12-26T23:42:35Z

Reviewer's Guide

Refactors benchmark configuration into a new compositional bencher.bench_cfg package with sub-config classes and a delegating BenchRunCfg/BenchCfg, adds comprehensive tests for the new config API, and relaxes Hypothesis deadlines in multiprocessing-related tests to fix flakiness.

Sequence diagram for BenchRunCfg legacy flat parameter initialization

sequenceDiagram
    actor User
    participant CLI as CLI_argparse
    participant BenchRunCfg_cls as BenchRunCfg_class
    participant BenchRunCfg as BenchRunCfg_instance
    participant CacheCfg as cache
    participant ExecutionCfg as execution
    participant DisplayCfg as display
    participant VisualizationCfg as visualization
    participant TimeCfg as time
    participant BenchPlotSrvCfg as server

    User->>CLI: run program with args
    CLI->>BenchRunCfg_cls: from_cmd_line()
    BenchRunCfg_cls->>CLI: build ArgumentParser
    CLI-->>User: parse command line args
    CLI->>BenchRunCfg_cls: return parsed Namespace

    BenchRunCfg_cls->>BenchRunCfg_cls: construct params
    Note over BenchRunCfg_cls: cache_results, only_plot, port, nightly, time_event, repeats

    BenchRunCfg_cls->>BenchRunCfg: __init__(cache_results, only_plot, port, nightly, time_event, repeats)

    BenchRunCfg->>BenchPlotSrvCfg: create default instance
    BenchRunCfg->>CacheCfg: create default instance
    BenchRunCfg->>ExecutionCfg: create default instance
    BenchRunCfg->>DisplayCfg: create default instance
    BenchRunCfg->>VisualizationCfg: create default instance
    BenchRunCfg->>TimeCfg: create default instance

    BenchRunCfg->>BenchRunCfg: for each flat param use _DELEGATION_MAP
    BenchRunCfg->>CacheCfg: set cache_results, only_plot
    BenchRunCfg->>BenchPlotSrvCfg: set port
    BenchRunCfg->>ExecutionCfg: set repeats
    BenchRunCfg->>ExecutionCfg: set nightly
    BenchRunCfg->>TimeCfg: set time_event

    BenchRunCfg-->>BenchRunCfg_cls: initialized BenchRunCfg_instance
    BenchRunCfg_cls-->>User: BenchRunCfg_instance with sub-configs and flat access

Class diagram for new benchmark configuration package

classDiagram
    class BenchPlotSrvCfg {
        <<param.Parameterized>>
        port : int
        allow_ws_origin : bool
        show : bool
    }

    class CacheCfg {
        <<param.Parameterized>>
        cache_results : bool
        clear_cache : bool
        cache_samples : bool
        only_hash_tag : bool
        clear_sample_cache : bool
        overwrite_sample_cache : bool
        only_plot : bool
    }

    class ExecutionCfg {
        <<param.Parameterized>>
        repeats : int
        level : int
        executor
        nightly : bool
        headless : bool
    }

    class DisplayCfg {
        <<param.Parameterized>>
        summarise_constant_inputs : bool
        print_bench_inputs : bool
        print_bench_results : bool
        print_pandas : bool
        print_xarray : bool
        serve_pandas : bool
        serve_pandas_flat : bool
        serve_xarray : bool
    }

    class VisualizationCfg {
        <<param.Parameterized>>
        auto_plot : bool
        use_holoview : bool
        use_optuna : bool
        render_plotly : bool
        raise_duplicate_exception : bool
        plot_size : int
        plot_width : int
        plot_height : int
    }

    class TimeCfg {
        <<param.Parameterized>>
        over_time : bool
        clear_history : bool
        time_event : str
        run_tag : str
        run_date : datetime
    }

    class BenchRunCfg {
        <<param.Parameterized>>
        server : BenchPlotSrvCfg
        cache : CacheCfg
        execution : ExecutionCfg
        display : DisplayCfg
        visualization : VisualizationCfg
        time : TimeCfg
        +BenchRunCfg(**params)
        +from_cmd_line() BenchRunCfg
        +deep() BenchRunCfg
        +__getattr__(name)
        +__setattr__(name, value)
    }

    class BenchCfg {
        <<BenchRunCfg>>
        input_vars : List
        result_vars : List
        const_vars : List
        result_hmaps : List
        meta_vars : List
        all_vars : List
        iv_time : List
        iv_time_event : List
        name : str
        title : str
        raise_duplicate_exception : bool
        bench_name : str
        description : str
        post_description : str
        has_results : bool
        pass_repeat : bool
        tag : str
        hash_value : str
        plot_callbacks : List
        +BenchCfg(**params)
        +hash_persistent(include_repeats) str
        +inputs_as_str() List~str~
        +to_latex()
        +describe_sweep(width, accordion)
        +sweep_sentence()
        +describe_benchmark() str
        +to_title(panel_name)
        +to_description(width)
        +to_post_description(width)
        +to_sweep_summary(name, description, describe_sweep, results_suffix, title)
        +optuna_targets(as_var) List
    }

    class DimsCfg {
        dims_name : List~str~
        dim_ranges : List~List~
        dims_size : List~int~
        dim_ranges_index : List~List~int~~
        dim_ranges_str : List~str~
        coords : Dict
        +DimsCfg(bench_cfg)
    }

    BenchRunCfg <|-- BenchCfg

    BenchRunCfg *-- BenchPlotSrvCfg : server
    BenchRunCfg *-- CacheCfg : cache
    BenchRunCfg *-- ExecutionCfg : execution
    BenchRunCfg *-- DisplayCfg : display
    BenchRunCfg *-- VisualizationCfg : visualization
    BenchRunCfg *-- TimeCfg : time

    DimsCfg ..> BenchCfg : uses

    class bench_cfg_package {
    }
    bench_cfg_package o-- BenchPlotSrvCfg
    bench_cfg_package o-- CacheCfg
    bench_cfg_package o-- ExecutionCfg
    bench_cfg_package o-- DisplayCfg
    bench_cfg_package o-- VisualizationCfg
    bench_cfg_package o-- TimeCfg
    bench_cfg_package o-- BenchRunCfg
    bench_cfg_package o-- BenchCfg
    bench_cfg_package o-- DimsCfg

File-Level Changes

Change	Details	Files
Relax Hypothesis deadlines in job tests to reduce flakiness with multiprocessing executor startup overhead.	Increase deadline for basic job test to accommodate multiprocessing startup. Increase deadline for overwrite job test to accommodate multiprocessing startup. Increase deadline for parallel bench runner test to accommodate multiprocessing startup.	`test/test_job.py`
Introduce a compositional benchmark run configuration (BenchRunCfg) that delegates to dedicated sub-config classes while maintaining backward-compatible flat parameter access.	Define sub-config classes for cache, execution, display, visualization, time, and server settings. Implement BenchRunCfg that owns sub-config instances and delegates attribute get/set based on a generated delegation map. Support both grouped (cfg.cache.cache_results) and flat (cfg.cache_results) parameter access, including constructor routing of legacy flat parameters. Provide a deep() method and from_cmd_line() factory for creating configurations via CLI arguments.	`bencher/bench_cfg/run_cfg.py` `bencher/bench_cfg/cache_cfg.py` `bencher/bench_cfg/execution_cfg.py` `bencher/bench_cfg/display_cfg.py` `bencher/bench_cfg/visualization_cfg.py` `bencher/bench_cfg/time_cfg.py` `bencher/bench_cfg/server_cfg.py`
Add BenchCfg as the full benchmark configuration with hashing, description, and panel/LaTeX rendering helpers built on BenchRunCfg.	Implement BenchCfg with input/result/constant/meta variable lists and metadata fields such as name, title, tags, and flags. Override hash_persistent to generate stable hashes from key configuration and variables, with optional repeat sensitivity. Add helpers to generate LaTeX, markdown descriptions, sweep summaries, and panel layouts for benchmark configuration and results. Provide convenience methods such as inputs_as_str, sweep_sentence, describe_benchmark, and optuna_targets.	`bencher/bench_cfg/bench_cfg_class.py`
Introduce DimsCfg helper for extracting dimension metadata from BenchCfg for downstream use.	Derive dimension names, ranges, sizes, indices, and coordinate mappings from a BenchCfg instance. Log derived dimension metadata for debugging.	`bencher/bench_cfg/dims_cfg.py`
Create bencher.bench_cfg package API surface for configuration classes and remove legacy flat module.	Expose sub-config classes, BenchRunCfg, BenchCfg, and DimsCfg through bencher.bench_cfg.all for import stability. Delete legacy bencher/bench_cfg.py in favor of the new package-based layout.	`bencher/bench_cfg/__init__.py` `bencher/bench_cfg.py`
Add tests to verify behavior, composition, and backward compatibility of the new configuration system and imports.	Test defaults and behavior of each sub-config class. Test BenchRunCfg composition, flat vs grouped access, constructor routing, synchronization, and deep copying semantics. Test BenchCfg inheritance of delegation, hash_persistent behavior, description output, and over_time handling. Test backward compatibility of importing config classes from both bencher.bench_cfg and bencher top-level namespace.	`test/test_bench_cfg.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 10 issues, and left some high level feedback:

Several param fields are declared with default=None but without allow_None=True (e.g. TimeCfg.time_event, VisualizationCfg.plot_size/plot_width/plot_height, BenchCfg.name/title/...), which can conflict with param’s type constraints; consider explicitly enabling allow_None or using non-None defaults.
_DELEGATION_MAP is built by blindly merging all sub-config param names, so a duplicated parameter name across sub-configs will silently map to whichever class happens to be processed last; to avoid subtle bugs, add a collision check or make the owning sub-config explicit when conflicts occur.
In BenchCfg.hash_persistent, input_vars, result_vars, and const_vars are assumed to be non-None and iterable; to make the config more robust to partial construction, guard these attributes (e.g. treat None as an empty list) before iterating over them.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Several param fields are declared with `default=None` but without `allow_None=True` (e.g. `TimeCfg.time_event`, `VisualizationCfg.plot_size/plot_width/plot_height`, `BenchCfg.name/title/...`), which can conflict with `param`’s type constraints; consider explicitly enabling `allow_None` or using non-None defaults.
- _DELEGATION_MAP is built by blindly merging all sub-config param names, so a duplicated parameter name across sub-configs will silently map to whichever class happens to be processed last; to avoid subtle bugs, add a collision check or make the owning sub-config explicit when conflicts occur.
- In `BenchCfg.hash_persistent`, `input_vars`, `result_vars`, and `const_vars` are assumed to be non-None and iterable; to make the config more robust to partial construction, guard these attributes (e.g. treat `None` as an empty list) before iterating over them.

## Individual Comments

### Comment 1
<location> `bencher/bench_cfg/bench_cfg_class.py:20-29` </location>
<code_context>
-        plot_callbacks (List): Callables that take a BenchResult and return panel representation
-    """
-
-    input_vars = param.List(
-        default=None,
-        doc="A list of ParameterizedSweep variables to perform a parameter sweep over",
-    )
-    result_vars = param.List(
-        default=None,
-        doc="A list of ParameterizedSweep results collect and plot.",
-    )
-
-    const_vars = param.List(
-        default=None,
-        doc="Variables to keep constant but are different from the default value",
-    )
-
-    result_hmaps = param.List(default=None, doc="a list of holomap results")
-
-    meta_vars = param.List(
</code_context>

<issue_to_address>
**issue (bug_risk):** Using `default=None` for list params but treating them as sequences later risks runtime errors.

Parameters like `input_vars`, `result_vars`, `const_vars`, `result_hmaps`, and `meta_vars` default to `None` but are later used as if they are always lists (e.g. `self.input_vars + self.result_vars`, `for v in self.const_vars`). If left as `None`, this will raise `TypeError`. Please either give them `[]` defaults or normalize `None` to empty lists in `__init__` before they are used.
</issue_to_address>

### Comment 2
<location> `bencher/bench_cfg/time_cfg.py:33-34` </location>
<code_context>
-        doc="Define a tag for a run to isolate the results stored in the cache from other runs",
-    )
-
-    run_date: datetime = param.Date(
-        default=datetime.now(),
-        doc="The date the bench run was performed",
-    )
</code_context>

<issue_to_address>
**issue (bug_risk):** `run_date` default is evaluated at import time, not at instance creation.

Because `default=datetime.now()` is evaluated at import time, all `TimeCfg` instances in a process will share the same `run_date`, which is likely incorrect. To capture per-instance creation time, use a callable default (e.g. `param.Date(default=datetime.now)` if supported) or set `run_date` in `__init__` when it’s not provided. Otherwise this field will be inaccurate for long‑running processes.
</issue_to_address>

### Comment 3
<location> `bencher/bench_cfg/bench_cfg_class.py:60-61` </location>
<code_context>
-    )
-    name: Optional[str] = param.String(None, doc="The name of the benchmarkCfg")
-    title: Optional[str] = param.String(None, doc="The title of the benchmark")
-    raise_duplicate_exception: bool = param.Boolean(
-        False, doc="Use this while debugging if filename generation is unique"
-    )
-    bench_name: Optional[str] = param.String(
</code_context>

<issue_to_address>
**question (bug_risk):** Duplicate `raise_duplicate_exception` parameter name may conflict with the visualization config.

Because `BenchCfg` inherits from `BenchRunCfg` and delegates sub-config parameters, defining `raise_duplicate_exception` here may clash with `VisualizationCfg.raise_duplicate_exception`, affecting how `cfg.raise_duplicate_exception` is resolved vs `cfg.visualization.raise_duplicate_exception`. Consider either relying on the visualization parameter or renaming this one to avoid ambiguity and accidental shadowing.
</issue_to_address>

### Comment 4
<location> `bencher/bench_cfg/run_cfg.py:31-40` </location>
<code_context>
+_PARAM_INTERNALS = {"name"}
+
+
+def _build_delegation_map():
+    """Build delegation map from sub-config param definitions."""
+    delegation_map = {}
+    for sub_name, cfg_cls in _SUBCONFIG_CLASSES.items():
+        for pname in cfg_cls.param:
+            if pname in _PARAM_INTERNALS:
+                continue
+            delegation_map[pname] = sub_name
+    return delegation_map
+
+
+_DELEGATION_MAP = _build_delegation_map()
+
+
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Delegation map currently relies on param name uniqueness across all sub-configs.

Because all param names are merged into one flat map, any name collision between sub-configs will be silently resolved in favor of the last processed sub-config. To avoid subtle routing bugs, consider either failing fast on collisions when building `_DELEGATION_MAP` or clearly documenting/enforcing global param-name uniqueness across sub-configs.
</issue_to_address>

### Comment 5
<location> `test/test_bench_cfg.py:57` </location>
<code_context>
+        self.assertTrue(cfg.show)
+
+
+class TestBenchRunCfgComposition(unittest.TestCase):
+    """Test BenchRunCfg composition and delegation."""
+
</code_context>

<issue_to_address>
**suggestion (testing):** Add tests covering delegation of unknown/non-delegated attributes to ensure `__getattr__`/`__setattr__` do not mask AttributeErrors.

Existing tests only cover known delegated attributes; they don’t verify that truly unknown attributes still raise `AttributeError` instead of being intercepted by the delegation layer. Please add a test (e.g. `with self.assertRaises(AttributeError): _ = cfg.some_unknown_param`) to guard `__getattr__`/`__setattr__` behavior against regressions.

Suggested implementation:

```python
        self.assertIsNone(cfg.port)
        self.assertTrue(cfg.show)


class TestBenchRunCfgComposition(unittest.TestCase):
    """Test BenchRunCfg composition and delegation."""

    def test_unknown_attribute_get_raises_attribute_error(self):
        cfg = BenchRunCfg()
        with self.assertRaises(AttributeError):
            _ = cfg.some_unknown_param

    def test_unknown_attribute_set_raises_attribute_error(self):
        cfg = BenchRunCfg()
        with self.assertRaises(AttributeError):
            cfg.some_unknown_param = 42


import unittest

```

```python
from bencher.bench_cfg import (
    BenchRunCfg,
    BenchPlotSrvCfg,
    CacheCfg,
    ExecutionCfg,
    DisplayCfg,
    VisualizationCfg,
    TimeCfg,
)

```
</issue_to_address>

### Comment 6
<location> `test/test_bench_cfg.py:156-166` </location>
<code_context>
+        )
+        self.assertTrue(cfg.over_time)
+
+    def test_hash_persistent_varies_with_repeats(self):
+        """hash_persistent changes when repeats differs."""
+        cfg1 = BenchCfg(
</code_context>

<issue_to_address>
**suggestion (testing):** Extend `hash_persistent` tests to cover non-empty vars and const_vars so hashing of configuration content is validated.

This test currently only exercises how `repeats` affects `hash_persistent`, with `input_vars`, `result_vars`, and `const_vars` all empty. Since the implementation also hashes these collections, please add a test using simple stub variable objects (with a known `hash_persistent()` and `.name`) to verify that changes to inputs, constants, or results also change the hash. That will validate the new hashing is content-sensitive, not just repeat-sensitive.

```suggestion
        cfg = BenchCfg(
            input_vars=[],
            result_vars=[],
            const_vars=[],
            bench_name="test",
            title="test",
            over_time=True,
        )
        self.assertTrue(cfg.over_time)

    class _StubVar:
        def __init__(self, name, persistent_hash):
            self.name = name
            self._persistent_hash = persistent_hash

        def hash_persistent(self):
            return self._persistent_hash

    def test_hash_persistent_varies_with_vars_and_const_vars(self):
        """hash_persistent changes when input/result/const vars differ."""
        base_input = [self._StubVar("in", "in-1")]
        base_const = [self._StubVar("const", "const-1")]
        base_result = [self._StubVar("res", "res-1")]

        cfg_base = BenchCfg(
            input_vars=base_input,
            result_vars=base_result,
            const_vars=base_const,
            bench_name="test",
            title="test",
            repeats=1,
        )

        cfg_input_changed = BenchCfg(
            input_vars=[self._StubVar("in", "in-2")],
            result_vars=base_result,
            const_vars=base_const,
            bench_name="test",
            title="test",
            repeats=1,
        )

        cfg_const_changed = BenchCfg(
            input_vars=base_input,
            result_vars=base_result,
            const_vars=[self._StubVar("const", "const-2")],
            bench_name="test",
            title="test",
            repeats=1,
        )

        cfg_result_changed = BenchCfg(
            input_vars=base_input,
            result_vars=[self._StubVar("res", "res-2")],
            const_vars=base_const,
            bench_name="test",
            title="test",
            repeats=1,
        )

        base_hash = cfg_base.hash_persistent()
        self.assertNotEqual(base_hash, cfg_input_changed.hash_persistent())
        self.assertNotEqual(base_hash, cfg_const_changed.hash_persistent())
        self.assertNotEqual(base_hash, cfg_result_changed.hash_persistent())

    def test_hash_persistent_varies_with_repeats(self):
```
</issue_to_address>

### Comment 7
<location> `test/test_bench_cfg.py:189` </location>
<code_context>
+        # With include_repeats=False, hashes should be equal
+        self.assertEqual(cfg1.hash_persistent(False), cfg2.hash_persistent(False))
+
+    def test_describe_benchmark_includes_delegated_fields(self):
+        """describe_benchmark output includes delegated config values."""
+        cfg = BenchCfg(
</code_context>

<issue_to_address>
**suggestion (testing):** Broaden `describe_benchmark` coverage to include non-empty input/result/const/meta variables.

This currently only verifies delegated fields like `level`, `cache_results`, and `cache_samples`. Please add a test that supplies minimal stub entries for `input_vars`, `const_vars`, `result_vars`, and `meta_vars`, and asserts their descriptions appear in the output, so those branches remain covered as the config classes change.

Suggested implementation:

```python
        # With include_repeats=False, hashes should be equal
        self.assertEqual(cfg1.hash_persistent(False), cfg2.hash_persistent(False))

    def test_describe_benchmark_includes_delegated_and_variable_fields(self):
        """describe_benchmark output includes delegated and variable config values."""

        # Minimal stub variable entries to exercise describe_benchmark branches.
        # Replace DummyVar with the concrete var config types used in the codebase
        # (e.g. InputVarCfg/ConstVarCfg/ResultVarCfg/MetaVarCfg).
        @dataclasses.dataclass
        class DummyVar:
            name: str
            description: str

        input_var = DummyVar(name="x_input", description="input variable x")
        const_var = DummyVar(name="c_const", description="constant c")
        result_var = DummyVar(name="r_result", description="result r")
        meta_var = DummyVar(name="m_meta", description="meta m")

        cfg = BenchCfg(
            input_vars=[input_var],
            result_vars=[result_var],
            const_vars=[const_var],
            meta_vars=[meta_var],
            bench_name="test_bench",
            title="Test benchmark",
            cache_results=True,
            cache_samples=True,
            repeats=3,
        )

        description = cfg.describe_benchmark()

        # Delegated fields still covered
        self.assertIn("test_bench", description)
        self.assertIn("Test benchmark", description)
        self.assertIn("cache_results", description)
        self.assertIn("cache_samples", description)
        self.assertIn("repeats", description)

        # Variable fields: ensure each stub variable is represented in output
        self.assertIn("x_input", description)
        self.assertIn("input variable x", description)

        self.assertIn("c_const", description)
        self.assertIn("constant c", description)

        self.assertIn("r_result", description)
        self.assertIn("result r", description)

        self.assertIn("m_meta", description)
        self.assertIn("meta m", description)

```

1. Ensure `import dataclasses` (or `from dataclasses import dataclass`) is present at the top of `test/test_bench_cfg.py`. If `dataclasses` is already imported, you can drop the `dataclasses.` prefix and just use `@dataclass`.
2. Replace the `DummyVar` dataclass with the real config classes used for `input_vars`, `const_vars`, `result_vars`, and `meta_vars` (for example `InputVarCfg`, `ConstVarCfg`, `ResultVarCfg`, `MetaVarCfg`), and adjust constructor arguments accordingly.
3. If `describe_benchmark()` formats variable descriptions differently (e.g. prefixes like `input:` or shows only names), tweak the `assertIn` expectations to match the actual output shape while still ensuring that non-empty input/const/result/meta entries exercise their respective code paths.
</issue_to_address>

### Comment 8
<location> `test/test_bench_cfg.py:209` </location>
<code_context>
+        self.assertIn("cache_samples True", description)
+
+
+class TestBackwardCompatibility(unittest.TestCase):
+    """Test backward compatibility of imports."""
+
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding behavior-level backward-compatibility tests, not just import-level ones.

Right now these tests only ensure legacy modules still export the classes. It would be more robust to also construct objects through the legacy imports (e.g. `BenchRunCfg(cache_results=True)` from the top-level `bencher`) and assert that both the legacy flat attributes and the new grouped ones (e.g. `cfg.cache_results` and `cfg.cache.cache_results`) behave identically. That way, refactors in `bench_cfg` are less likely to break existing user code.

Suggested implementation:

```python
import unittest
from datetime import datetime

from bencher import BenchRunCfg
from bencher.bench_cfg import (
    BenchPlotSrvCfg,
    CacheCfg,
    ExecutionCfg,
    DisplayCfg,
    VisualizationCfg,
    TimeCfg,

```

```python
class TestBackwardCompatibility(unittest.TestCase):
    """Test backward compatibility of imports and behavior."""

    def test_imports(self):
        """Legacy imports from bencher.bench_cfg remain available."""
        # The import-level check is done via the module-level imports above.
        # This test provides an explicit assertion so regressions are reported clearly.
        self.assertIsNotNone(BenchPlotSrvCfg)
        self.assertIsNotNone(CacheCfg)
        self.assertIsNotNone(ExecutionCfg)
        self.assertIsNotNone(DisplayCfg)
        self.assertIsNotNone(VisualizationCfg)
        self.assertIsNotNone(TimeCfg)

    def test_legacy_benchruncfg_cache_attributes(self):
        """Ensure legacy flat cache attributes and new grouped ones behave identically."""
        # Construct via the legacy top-level import to mimic existing user code.
        cfg = BenchRunCfg(cache_results=True, cache_samples=True)

        # Legacy flat attributes must still be present.
        self.assertTrue(cfg.cache_results)
        self.assertTrue(cfg.cache_samples)

        # New grouped cache configuration should be kept in sync.
        self.assertTrue(cfg.cache.cache_results)
        self.assertTrue(cfg.cache.cache_samples)

        # The values exposed via legacy and grouped attributes must match.
        self.assertEqual(cfg.cache_results, cfg.cache.cache_results)
        self.assertEqual(cfg.cache_samples, cfg.cache.cache_samples)

```
</issue_to_address>

### Comment 9
<location> `test/test_bench_cfg.py:18` </location>
<code_context>
+)
+
+
+class TestSubConfigClasses(unittest.TestCase):
+    """Test individual sub-configuration classes."""
+
</code_context>

<issue_to_address>
**suggestion (testing):** Add tests that verify sub-config defaults interact correctly when wired through `BenchRunCfg`.

Current tests cover each sub-config’s defaults and `TestBenchRunCfgComposition` covers delegation, but nothing verifies that a new `BenchRunCfg()` exposes those defaults correctly via its flat API (e.g. `BenchRunCfg().cache_results` is `False`, `BenchRunCfg().over_time` is `False`, etc.). Please add a test that instantiates `BenchRunCfg()` and asserts a representative subset of these flat attributes matches the corresponding sub-config defaults to guard against wiring regressions.
</issue_to_address>

### Comment 10
<location> `bencher/bench_cfg/run_cfg.py:18` </location>
<code_context>
+from bencher.bench_cfg.time_cfg import TimeCfg
+
+# Sub-config classes mapped to their attribute names
+_SUBCONFIG_CLASSES = {
+    "server": BenchPlotSrvCfg,
+    "cache": CacheCfg,
</code_context>

<issue_to_address>
**issue (complexity):** Consider replacing the reflection-built delegation map with a fixed, explicitly listed mapping of supported legacy flat parameters to sub-configs to simplify and clarify the API surface.

You can drop the reflection-based delegation and make the mapping explicit while keeping all current behavior (legacy flat kwargs, delegation, ctor routing).

Instead of `_SUBCONFIG_CLASSES`, `_PARAM_INTERNALS`, and `_build_delegation_map()`, define a static map of the legacy flat names you actually want to support:

```python
# Explicit, controlled mapping of legacy flat names → sub-config attribute
_DELEGATION_MAP = {
    # cache
    "cache_results": "cache",
    "only_plot": "cache",
    # execution
    "repeats": "execution",
    # display
    "some_display_param": "display",
    # visualization
    "some_vis_param": "visualization",
    # time
    "time_event": "time",
    # server
    "port": "server",
    # ...add only the legacy names you intentionally support
}
```

Then you can remove:

```python
_SUBCONFIG_CLASSES = { ... }
_PARAM_INTERNALS = {"name"}

def _build_delegation_map():
    ...
_DELEGATION_MAP = _build_delegation_map()
```

All existing logic in `__init__`, `__getattr__`, and `__setattr__` continues to work unchanged, but:

- Adding a new param to a sub-config no longer silently creates a new top-level attribute on `BenchRunCfg`.
- The delegated surface API is visible and grep-able in one place.
- Coupling is local to `BenchRunCfg` instead of driven by external `param` definitions.

If you want to make the intent even clearer without changing behavior, you can also add a short comment at the map definition pointing out that it defines the legacy flat API:

```python
# Legacy flat API maintained for backward compatibility.
# New code should prefer cfg.<subconfig>.<param> access.
_DELEGATION_MAP = { ... }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-12-26T23:44:21Z

bencher/bench_cfg/time_cfg.py

+    run_date: datetime = param.Date(
+        default=datetime.now(),


issue (bug_risk): run_date default is evaluated at import time, not at instance creation.

Because default=datetime.now() is evaluated at import time, all TimeCfg instances in a process will share the same run_date, which is likely incorrect. To capture per-instance creation time, use a callable default (e.g. param.Date(default=datetime.now) if supported) or set run_date in __init__ when it’s not provided. Otherwise this field will be inaccurate for long‑running processes.

sourcery-ai · 2025-12-26T23:44:21Z

bencher/bench_cfg/bench_cfg_class.py

+    raise_duplicate_exception: bool = param.Boolean(
+        False, doc="Use this while debugging if filename generation is unique"


question (bug_risk): Duplicate raise_duplicate_exception parameter name may conflict with the visualization config.

Because BenchCfg inherits from BenchRunCfg and delegates sub-config parameters, defining raise_duplicate_exception here may clash with VisualizationCfg.raise_duplicate_exception, affecting how cfg.raise_duplicate_exception is resolved vs cfg.visualization.raise_duplicate_exception. Consider either relying on the visualization parameter or renaming this one to avoid ambiguity and accidental shadowing.

sourcery-ai · 2025-12-26T23:44:21Z

test/test_bench_cfg.py

+        cfg = BenchCfg(
+            input_vars=[],
+            result_vars=[],
+            const_vars=[],
+            bench_name="test",
+            title="test",
+            over_time=True,
+        )
+        self.assertTrue(cfg.over_time)
+
+    def test_hash_persistent_varies_with_repeats(self):


suggestion (testing): Extend hash_persistent tests to cover non-empty vars and const_vars so hashing of configuration content is validated.

This test currently only exercises how repeats affects hash_persistent, with input_vars, result_vars, and const_vars all empty. Since the implementation also hashes these collections, please add a test using simple stub variable objects (with a known hash_persistent() and .name) to verify that changes to inputs, constants, or results also change the hash. That will validate the new hashing is content-sensitive, not just repeat-sensitive.

Suggested change

cfg = BenchCfg(

input_vars=[],

result_vars=[],

const_vars=[],

bench_name="test",

title="test",

over_time=True,

)

self.assertTrue(cfg.over_time)

def test_hash_persistent_varies_with_repeats(self):

cfg = BenchCfg(

input_vars=[],

result_vars=[],

const_vars=[],

bench_name="test",

title="test",

over_time=True,

)

self.assertTrue(cfg.over_time)

class _StubVar:

def __init__(self, name, persistent_hash):

self.name = name

self._persistent_hash = persistent_hash

def hash_persistent(self):

return self._persistent_hash

def test_hash_persistent_varies_with_vars_and_const_vars(self):

"""hash_persistent changes when input/result/const vars differ."""

base_input = [self._StubVar("in", "in-1")]

base_const = [self._StubVar("const", "const-1")]

base_result = [self._StubVar("res", "res-1")]

cfg_base = BenchCfg(

input_vars=base_input,

result_vars=base_result,

const_vars=base_const,

bench_name="test",

title="test",

repeats=1,

)

cfg_input_changed = BenchCfg(

input_vars=[self._StubVar("in", "in-2")],

result_vars=base_result,

const_vars=base_const,

bench_name="test",

title="test",

repeats=1,

)

cfg_const_changed = BenchCfg(

input_vars=base_input,

result_vars=base_result,

const_vars=[self._StubVar("const", "const-2")],

bench_name="test",

title="test",

repeats=1,

)

cfg_result_changed = BenchCfg(

input_vars=base_input,

result_vars=[self._StubVar("res", "res-2")],

const_vars=base_const,

bench_name="test",

title="test",

repeats=1,

)

base_hash = cfg_base.hash_persistent()

self.assertNotEqual(base_hash, cfg_input_changed.hash_persistent())

self.assertNotEqual(base_hash, cfg_const_changed.hash_persistent())

self.assertNotEqual(base_hash, cfg_result_changed.hash_persistent())

def test_hash_persistent_varies_with_repeats(self):

sourcery-ai · 2025-12-26T23:44:21Z

bencher/bench_cfg/run_cfg.py

+from bencher.bench_cfg.time_cfg import TimeCfg
+
+# Sub-config classes mapped to their attribute names
+_SUBCONFIG_CLASSES = {


issue (complexity): Consider replacing the reflection-built delegation map with a fixed, explicitly listed mapping of supported legacy flat parameters to sub-configs to simplify and clarify the API surface.

You can drop the reflection-based delegation and make the mapping explicit while keeping all current behavior (legacy flat kwargs, delegation, ctor routing).

Instead of _SUBCONFIG_CLASSES, _PARAM_INTERNALS, and _build_delegation_map(), define a static map of the legacy flat names you actually want to support:

# Explicit, controlled mapping of legacy flat names → sub-config attribute _DELEGATION_MAP = { # cache "cache_results": "cache", "only_plot": "cache", # execution "repeats": "execution", # display "some_display_param": "display", # visualization "some_vis_param": "visualization", # time "time_event": "time", # server "port": "server", # ...add only the legacy names you intentionally support }

Then you can remove:

_SUBCONFIG_CLASSES = { ... } _PARAM_INTERNALS = {"name"} def _build_delegation_map(): ... _DELEGATION_MAP = _build_delegation_map()

All existing logic in __init__, __getattr__, and __setattr__ continues to work unchanged, but:

Adding a new param to a sub-config no longer silently creates a new top-level attribute on BenchRunCfg.

The delegated surface API is visible and grep-able in one place.

Coupling is local to BenchRunCfg instead of driven by external param definitions.

If you want to make the intent even clearer without changing behavior, you can also add a short comment at the map definition pointing out that it defines the legacy flat API:

# Legacy flat API maintained for backward compatibility. # New code should prefer cfg.<subconfig>.<param> access. _DELEGATION_MAP = { ... }

codecov · 2025-12-26T23:45:55Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.39%. Comparing base (685cb99) to head (a63e6af).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #689   +/-   ##
=======================================
  Coverage   89.39%   89.39%           
=======================================
  Files          91       91           
  Lines        4432     4432           
=======================================
  Hits         3962     3962           
  Misses        470      470

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

blooop · 2025-12-26T23:51:26Z

Are these new flaky test or old tests? The existing tests are not flaky so if this is increasing the deadlines on old tests this indicates a problem with the implementation. (Sorry on phone so difficult to see the trace of the test heritage)

mfinean · 2025-12-27T00:01:33Z

Are these new flaky test or old tests? The existing tests are not flaky so if this is increasing the deadlines on old tests this indicates a problem with the implementation. (Sorry on phone so difficult to see the trace of the test heritage)

I'm fairly sure they're existing flaky tests.

blooop · 2025-12-27T00:03:04Z

Flaky on ci or your machine? I have ironed out all flaky tests a long time ago

blooop · 2025-12-27T00:04:11Z

My meaning being, if they have become flaky now, it indicates a performance regression

mfinean · 2025-12-27T00:07:38Z

Ah, interesting. Right, so CI runs on Linux (platforms = ["linux-64"] in pyproject.toml)
I'm currently working on macOS which uses spawn for multiprocessing (reimports all modules per subprocess). Linux uses fork which is much faster. So the ~2000ms overhead is macOS multiprocessing startup, not a code regression.

Interesting though. Could be worth increasing the deadlines anyway on the tests then and/or testing on different platforms?

blooop · 2025-12-27T00:08:41Z

Ah ok, that doesn't sound like a performance regression after all then, and yes could add Mac to test matrix

mfinean · 2025-12-27T00:11:15Z

Just updated - was based off the other PR. This was meant to be a standalone test fix

blooop · 2025-12-27T00:13:58Z

Yeah, but given it was based on the previous pr which changed cache/hash behaviour I wondered if those changes were causing performance regressions. It sounds like that is not the case though and is related to platform differences.

mfinean requested a review from blooop as a code owner December 26, 2025 23:42

sourcery-ai bot reviewed Dec 26, 2025

View reviewed changes

Increase hypothesis test deadlines for multiprocessing tests

a63e6af

mfinean force-pushed the fix/increase-hypothesis-test-deadlines branch from f1ffbb9 to a63e6af Compare December 27, 2025 00:10

blooop approved these changes Dec 27, 2025

View reviewed changes

blooop merged commit f02edb9 into blooop:main Dec 27, 2025
8 of 9 checks passed

		raise_duplicate_exception: bool = param.Boolean(
		False, doc="Use this while debugging if filename generation is unique"

Fix flaky hypothesis tests in test_job.py #689

Fix flaky hypothesis tests in test_job.py #689

Uh oh!

Conversation

mfinean commented Dec 26, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for BenchRunCfg legacy flat parameter initialization

Class diagram for new benchmark configuration package

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

blooop commented Dec 26, 2025

Uh oh!

mfinean commented Dec 27, 2025

Uh oh!

blooop commented Dec 27, 2025

Uh oh!

blooop commented Dec 27, 2025

Uh oh!

mfinean commented Dec 27, 2025

Uh oh!

blooop commented Dec 27, 2025

Uh oh!

mfinean commented Dec 27, 2025

Uh oh!

blooop commented Dec 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mfinean commented Dec 26, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Dec 26, 2025 •

edited

Loading

codecov bot commented Dec 26, 2025 •

edited

Loading