-
Notifications
You must be signed in to change notification settings - Fork 3
Fix flaky hypothesis tests in test_job.py #689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flaky hypothesis tests in test_job.py #689
Conversation
Reviewer's GuideRefactors benchmark configuration into a new compositional bencher.bench_cfg package with sub-config classes and a delegating BenchRunCfg/BenchCfg, adds comprehensive tests for the new config API, and relaxes Hypothesis deadlines in multiprocessing-related tests to fix flakiness. Sequence diagram for BenchRunCfg legacy flat parameter initializationsequenceDiagram
actor User
participant CLI as CLI_argparse
participant BenchRunCfg_cls as BenchRunCfg_class
participant BenchRunCfg as BenchRunCfg_instance
participant CacheCfg as cache
participant ExecutionCfg as execution
participant DisplayCfg as display
participant VisualizationCfg as visualization
participant TimeCfg as time
participant BenchPlotSrvCfg as server
User->>CLI: run program with args
CLI->>BenchRunCfg_cls: from_cmd_line()
BenchRunCfg_cls->>CLI: build ArgumentParser
CLI-->>User: parse command line args
CLI->>BenchRunCfg_cls: return parsed Namespace
BenchRunCfg_cls->>BenchRunCfg_cls: construct params
Note over BenchRunCfg_cls: cache_results, only_plot, port, nightly, time_event, repeats
BenchRunCfg_cls->>BenchRunCfg: __init__(cache_results, only_plot, port, nightly, time_event, repeats)
BenchRunCfg->>BenchPlotSrvCfg: create default instance
BenchRunCfg->>CacheCfg: create default instance
BenchRunCfg->>ExecutionCfg: create default instance
BenchRunCfg->>DisplayCfg: create default instance
BenchRunCfg->>VisualizationCfg: create default instance
BenchRunCfg->>TimeCfg: create default instance
BenchRunCfg->>BenchRunCfg: for each flat param use _DELEGATION_MAP
BenchRunCfg->>CacheCfg: set cache_results, only_plot
BenchRunCfg->>BenchPlotSrvCfg: set port
BenchRunCfg->>ExecutionCfg: set repeats
BenchRunCfg->>ExecutionCfg: set nightly
BenchRunCfg->>TimeCfg: set time_event
BenchRunCfg-->>BenchRunCfg_cls: initialized BenchRunCfg_instance
BenchRunCfg_cls-->>User: BenchRunCfg_instance with sub-configs and flat access
Class diagram for new benchmark configuration packageclassDiagram
class BenchPlotSrvCfg {
<<param.Parameterized>>
port : int
allow_ws_origin : bool
show : bool
}
class CacheCfg {
<<param.Parameterized>>
cache_results : bool
clear_cache : bool
cache_samples : bool
only_hash_tag : bool
clear_sample_cache : bool
overwrite_sample_cache : bool
only_plot : bool
}
class ExecutionCfg {
<<param.Parameterized>>
repeats : int
level : int
executor
nightly : bool
headless : bool
}
class DisplayCfg {
<<param.Parameterized>>
summarise_constant_inputs : bool
print_bench_inputs : bool
print_bench_results : bool
print_pandas : bool
print_xarray : bool
serve_pandas : bool
serve_pandas_flat : bool
serve_xarray : bool
}
class VisualizationCfg {
<<param.Parameterized>>
auto_plot : bool
use_holoview : bool
use_optuna : bool
render_plotly : bool
raise_duplicate_exception : bool
plot_size : int
plot_width : int
plot_height : int
}
class TimeCfg {
<<param.Parameterized>>
over_time : bool
clear_history : bool
time_event : str
run_tag : str
run_date : datetime
}
class BenchRunCfg {
<<param.Parameterized>>
server : BenchPlotSrvCfg
cache : CacheCfg
execution : ExecutionCfg
display : DisplayCfg
visualization : VisualizationCfg
time : TimeCfg
+BenchRunCfg(**params)
+from_cmd_line() BenchRunCfg
+deep() BenchRunCfg
+__getattr__(name)
+__setattr__(name, value)
}
class BenchCfg {
<<BenchRunCfg>>
input_vars : List
result_vars : List
const_vars : List
result_hmaps : List
meta_vars : List
all_vars : List
iv_time : List
iv_time_event : List
name : str
title : str
raise_duplicate_exception : bool
bench_name : str
description : str
post_description : str
has_results : bool
pass_repeat : bool
tag : str
hash_value : str
plot_callbacks : List
+BenchCfg(**params)
+hash_persistent(include_repeats) str
+inputs_as_str() List~str~
+to_latex()
+describe_sweep(width, accordion)
+sweep_sentence()
+describe_benchmark() str
+to_title(panel_name)
+to_description(width)
+to_post_description(width)
+to_sweep_summary(name, description, describe_sweep, results_suffix, title)
+optuna_targets(as_var) List
}
class DimsCfg {
dims_name : List~str~
dim_ranges : List~List~
dims_size : List~int~
dim_ranges_index : List~List~int~~
dim_ranges_str : List~str~
coords : Dict
+DimsCfg(bench_cfg)
}
BenchRunCfg <|-- BenchCfg
BenchRunCfg *-- BenchPlotSrvCfg : server
BenchRunCfg *-- CacheCfg : cache
BenchRunCfg *-- ExecutionCfg : execution
BenchRunCfg *-- DisplayCfg : display
BenchRunCfg *-- VisualizationCfg : visualization
BenchRunCfg *-- TimeCfg : time
DimsCfg ..> BenchCfg : uses
class bench_cfg_package {
}
bench_cfg_package o-- BenchPlotSrvCfg
bench_cfg_package o-- CacheCfg
bench_cfg_package o-- ExecutionCfg
bench_cfg_package o-- DisplayCfg
bench_cfg_package o-- VisualizationCfg
bench_cfg_package o-- TimeCfg
bench_cfg_package o-- BenchRunCfg
bench_cfg_package o-- BenchCfg
bench_cfg_package o-- DimsCfg
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 10 issues, and left some high level feedback:
- Several param fields are declared with
default=Nonebut withoutallow_None=True(e.g.TimeCfg.time_event,VisualizationCfg.plot_size/plot_width/plot_height,BenchCfg.name/title/...), which can conflict withparam’s type constraints; consider explicitly enablingallow_Noneor using non-None defaults. - _DELEGATION_MAP is built by blindly merging all sub-config param names, so a duplicated parameter name across sub-configs will silently map to whichever class happens to be processed last; to avoid subtle bugs, add a collision check or make the owning sub-config explicit when conflicts occur.
- In
BenchCfg.hash_persistent,input_vars,result_vars, andconst_varsare assumed to be non-None and iterable; to make the config more robust to partial construction, guard these attributes (e.g. treatNoneas an empty list) before iterating over them.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Several param fields are declared with `default=None` but without `allow_None=True` (e.g. `TimeCfg.time_event`, `VisualizationCfg.plot_size/plot_width/plot_height`, `BenchCfg.name/title/...`), which can conflict with `param`’s type constraints; consider explicitly enabling `allow_None` or using non-None defaults.
- _DELEGATION_MAP is built by blindly merging all sub-config param names, so a duplicated parameter name across sub-configs will silently map to whichever class happens to be processed last; to avoid subtle bugs, add a collision check or make the owning sub-config explicit when conflicts occur.
- In `BenchCfg.hash_persistent`, `input_vars`, `result_vars`, and `const_vars` are assumed to be non-None and iterable; to make the config more robust to partial construction, guard these attributes (e.g. treat `None` as an empty list) before iterating over them.
## Individual Comments
### Comment 1
<location> `bencher/bench_cfg/bench_cfg_class.py:20-29` </location>
<code_context>
- plot_callbacks (List): Callables that take a BenchResult and return panel representation
- """
-
- input_vars = param.List(
- default=None,
- doc="A list of ParameterizedSweep variables to perform a parameter sweep over",
- )
- result_vars = param.List(
- default=None,
- doc="A list of ParameterizedSweep results collect and plot.",
- )
-
- const_vars = param.List(
- default=None,
- doc="Variables to keep constant but are different from the default value",
- )
-
- result_hmaps = param.List(default=None, doc="a list of holomap results")
-
- meta_vars = param.List(
</code_context>
<issue_to_address>
**issue (bug_risk):** Using `default=None` for list params but treating them as sequences later risks runtime errors.
Parameters like `input_vars`, `result_vars`, `const_vars`, `result_hmaps`, and `meta_vars` default to `None` but are later used as if they are always lists (e.g. `self.input_vars + self.result_vars`, `for v in self.const_vars`). If left as `None`, this will raise `TypeError`. Please either give them `[]` defaults or normalize `None` to empty lists in `__init__` before they are used.
</issue_to_address>
### Comment 2
<location> `bencher/bench_cfg/time_cfg.py:33-34` </location>
<code_context>
- doc="Define a tag for a run to isolate the results stored in the cache from other runs",
- )
-
- run_date: datetime = param.Date(
- default=datetime.now(),
- doc="The date the bench run was performed",
- )
</code_context>
<issue_to_address>
**issue (bug_risk):** `run_date` default is evaluated at import time, not at instance creation.
Because `default=datetime.now()` is evaluated at import time, all `TimeCfg` instances in a process will share the same `run_date`, which is likely incorrect. To capture per-instance creation time, use a callable default (e.g. `param.Date(default=datetime.now)` if supported) or set `run_date` in `__init__` when it’s not provided. Otherwise this field will be inaccurate for long‑running processes.
</issue_to_address>
### Comment 3
<location> `bencher/bench_cfg/bench_cfg_class.py:60-61` </location>
<code_context>
- )
- name: Optional[str] = param.String(None, doc="The name of the benchmarkCfg")
- title: Optional[str] = param.String(None, doc="The title of the benchmark")
- raise_duplicate_exception: bool = param.Boolean(
- False, doc="Use this while debugging if filename generation is unique"
- )
- bench_name: Optional[str] = param.String(
</code_context>
<issue_to_address>
**question (bug_risk):** Duplicate `raise_duplicate_exception` parameter name may conflict with the visualization config.
Because `BenchCfg` inherits from `BenchRunCfg` and delegates sub-config parameters, defining `raise_duplicate_exception` here may clash with `VisualizationCfg.raise_duplicate_exception`, affecting how `cfg.raise_duplicate_exception` is resolved vs `cfg.visualization.raise_duplicate_exception`. Consider either relying on the visualization parameter or renaming this one to avoid ambiguity and accidental shadowing.
</issue_to_address>
### Comment 4
<location> `bencher/bench_cfg/run_cfg.py:31-40` </location>
<code_context>
+_PARAM_INTERNALS = {"name"}
+
+
+def _build_delegation_map():
+ """Build delegation map from sub-config param definitions."""
+ delegation_map = {}
+ for sub_name, cfg_cls in _SUBCONFIG_CLASSES.items():
+ for pname in cfg_cls.param:
+ if pname in _PARAM_INTERNALS:
+ continue
+ delegation_map[pname] = sub_name
+ return delegation_map
+
+
+_DELEGATION_MAP = _build_delegation_map()
+
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Delegation map currently relies on param name uniqueness across all sub-configs.
Because all param names are merged into one flat map, any name collision between sub-configs will be silently resolved in favor of the last processed sub-config. To avoid subtle routing bugs, consider either failing fast on collisions when building `_DELEGATION_MAP` or clearly documenting/enforcing global param-name uniqueness across sub-configs.
</issue_to_address>
### Comment 5
<location> `test/test_bench_cfg.py:57` </location>
<code_context>
+ self.assertTrue(cfg.show)
+
+
+class TestBenchRunCfgComposition(unittest.TestCase):
+ """Test BenchRunCfg composition and delegation."""
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests covering delegation of unknown/non-delegated attributes to ensure `__getattr__`/`__setattr__` do not mask AttributeErrors.
Existing tests only cover known delegated attributes; they don’t verify that truly unknown attributes still raise `AttributeError` instead of being intercepted by the delegation layer. Please add a test (e.g. `with self.assertRaises(AttributeError): _ = cfg.some_unknown_param`) to guard `__getattr__`/`__setattr__` behavior against regressions.
Suggested implementation:
```python
self.assertIsNone(cfg.port)
self.assertTrue(cfg.show)
class TestBenchRunCfgComposition(unittest.TestCase):
"""Test BenchRunCfg composition and delegation."""
def test_unknown_attribute_get_raises_attribute_error(self):
cfg = BenchRunCfg()
with self.assertRaises(AttributeError):
_ = cfg.some_unknown_param
def test_unknown_attribute_set_raises_attribute_error(self):
cfg = BenchRunCfg()
with self.assertRaises(AttributeError):
cfg.some_unknown_param = 42
import unittest
```
```python
from bencher.bench_cfg import (
BenchRunCfg,
BenchPlotSrvCfg,
CacheCfg,
ExecutionCfg,
DisplayCfg,
VisualizationCfg,
TimeCfg,
)
```
</issue_to_address>
### Comment 6
<location> `test/test_bench_cfg.py:156-166` </location>
<code_context>
+ )
+ self.assertTrue(cfg.over_time)
+
+ def test_hash_persistent_varies_with_repeats(self):
+ """hash_persistent changes when repeats differs."""
+ cfg1 = BenchCfg(
</code_context>
<issue_to_address>
**suggestion (testing):** Extend `hash_persistent` tests to cover non-empty vars and const_vars so hashing of configuration content is validated.
This test currently only exercises how `repeats` affects `hash_persistent`, with `input_vars`, `result_vars`, and `const_vars` all empty. Since the implementation also hashes these collections, please add a test using simple stub variable objects (with a known `hash_persistent()` and `.name`) to verify that changes to inputs, constants, or results also change the hash. That will validate the new hashing is content-sensitive, not just repeat-sensitive.
```suggestion
cfg = BenchCfg(
input_vars=[],
result_vars=[],
const_vars=[],
bench_name="test",
title="test",
over_time=True,
)
self.assertTrue(cfg.over_time)
class _StubVar:
def __init__(self, name, persistent_hash):
self.name = name
self._persistent_hash = persistent_hash
def hash_persistent(self):
return self._persistent_hash
def test_hash_persistent_varies_with_vars_and_const_vars(self):
"""hash_persistent changes when input/result/const vars differ."""
base_input = [self._StubVar("in", "in-1")]
base_const = [self._StubVar("const", "const-1")]
base_result = [self._StubVar("res", "res-1")]
cfg_base = BenchCfg(
input_vars=base_input,
result_vars=base_result,
const_vars=base_const,
bench_name="test",
title="test",
repeats=1,
)
cfg_input_changed = BenchCfg(
input_vars=[self._StubVar("in", "in-2")],
result_vars=base_result,
const_vars=base_const,
bench_name="test",
title="test",
repeats=1,
)
cfg_const_changed = BenchCfg(
input_vars=base_input,
result_vars=base_result,
const_vars=[self._StubVar("const", "const-2")],
bench_name="test",
title="test",
repeats=1,
)
cfg_result_changed = BenchCfg(
input_vars=base_input,
result_vars=[self._StubVar("res", "res-2")],
const_vars=base_const,
bench_name="test",
title="test",
repeats=1,
)
base_hash = cfg_base.hash_persistent()
self.assertNotEqual(base_hash, cfg_input_changed.hash_persistent())
self.assertNotEqual(base_hash, cfg_const_changed.hash_persistent())
self.assertNotEqual(base_hash, cfg_result_changed.hash_persistent())
def test_hash_persistent_varies_with_repeats(self):
```
</issue_to_address>
### Comment 7
<location> `test/test_bench_cfg.py:189` </location>
<code_context>
+ # With include_repeats=False, hashes should be equal
+ self.assertEqual(cfg1.hash_persistent(False), cfg2.hash_persistent(False))
+
+ def test_describe_benchmark_includes_delegated_fields(self):
+ """describe_benchmark output includes delegated config values."""
+ cfg = BenchCfg(
</code_context>
<issue_to_address>
**suggestion (testing):** Broaden `describe_benchmark` coverage to include non-empty input/result/const/meta variables.
This currently only verifies delegated fields like `level`, `cache_results`, and `cache_samples`. Please add a test that supplies minimal stub entries for `input_vars`, `const_vars`, `result_vars`, and `meta_vars`, and asserts their descriptions appear in the output, so those branches remain covered as the config classes change.
Suggested implementation:
```python
# With include_repeats=False, hashes should be equal
self.assertEqual(cfg1.hash_persistent(False), cfg2.hash_persistent(False))
def test_describe_benchmark_includes_delegated_and_variable_fields(self):
"""describe_benchmark output includes delegated and variable config values."""
# Minimal stub variable entries to exercise describe_benchmark branches.
# Replace DummyVar with the concrete var config types used in the codebase
# (e.g. InputVarCfg/ConstVarCfg/ResultVarCfg/MetaVarCfg).
@dataclasses.dataclass
class DummyVar:
name: str
description: str
input_var = DummyVar(name="x_input", description="input variable x")
const_var = DummyVar(name="c_const", description="constant c")
result_var = DummyVar(name="r_result", description="result r")
meta_var = DummyVar(name="m_meta", description="meta m")
cfg = BenchCfg(
input_vars=[input_var],
result_vars=[result_var],
const_vars=[const_var],
meta_vars=[meta_var],
bench_name="test_bench",
title="Test benchmark",
cache_results=True,
cache_samples=True,
repeats=3,
)
description = cfg.describe_benchmark()
# Delegated fields still covered
self.assertIn("test_bench", description)
self.assertIn("Test benchmark", description)
self.assertIn("cache_results", description)
self.assertIn("cache_samples", description)
self.assertIn("repeats", description)
# Variable fields: ensure each stub variable is represented in output
self.assertIn("x_input", description)
self.assertIn("input variable x", description)
self.assertIn("c_const", description)
self.assertIn("constant c", description)
self.assertIn("r_result", description)
self.assertIn("result r", description)
self.assertIn("m_meta", description)
self.assertIn("meta m", description)
```
1. Ensure `import dataclasses` (or `from dataclasses import dataclass`) is present at the top of `test/test_bench_cfg.py`. If `dataclasses` is already imported, you can drop the `dataclasses.` prefix and just use `@dataclass`.
2. Replace the `DummyVar` dataclass with the real config classes used for `input_vars`, `const_vars`, `result_vars`, and `meta_vars` (for example `InputVarCfg`, `ConstVarCfg`, `ResultVarCfg`, `MetaVarCfg`), and adjust constructor arguments accordingly.
3. If `describe_benchmark()` formats variable descriptions differently (e.g. prefixes like `input:` or shows only names), tweak the `assertIn` expectations to match the actual output shape while still ensuring that non-empty input/const/result/meta entries exercise their respective code paths.
</issue_to_address>
### Comment 8
<location> `test/test_bench_cfg.py:209` </location>
<code_context>
+ self.assertIn("cache_samples True", description)
+
+
+class TestBackwardCompatibility(unittest.TestCase):
+ """Test backward compatibility of imports."""
+
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding behavior-level backward-compatibility tests, not just import-level ones.
Right now these tests only ensure legacy modules still export the classes. It would be more robust to also construct objects through the legacy imports (e.g. `BenchRunCfg(cache_results=True)` from the top-level `bencher`) and assert that both the legacy flat attributes and the new grouped ones (e.g. `cfg.cache_results` and `cfg.cache.cache_results`) behave identically. That way, refactors in `bench_cfg` are less likely to break existing user code.
Suggested implementation:
```python
import unittest
from datetime import datetime
from bencher import BenchRunCfg
from bencher.bench_cfg import (
BenchPlotSrvCfg,
CacheCfg,
ExecutionCfg,
DisplayCfg,
VisualizationCfg,
TimeCfg,
```
```python
class TestBackwardCompatibility(unittest.TestCase):
"""Test backward compatibility of imports and behavior."""
def test_imports(self):
"""Legacy imports from bencher.bench_cfg remain available."""
# The import-level check is done via the module-level imports above.
# This test provides an explicit assertion so regressions are reported clearly.
self.assertIsNotNone(BenchPlotSrvCfg)
self.assertIsNotNone(CacheCfg)
self.assertIsNotNone(ExecutionCfg)
self.assertIsNotNone(DisplayCfg)
self.assertIsNotNone(VisualizationCfg)
self.assertIsNotNone(TimeCfg)
def test_legacy_benchruncfg_cache_attributes(self):
"""Ensure legacy flat cache attributes and new grouped ones behave identically."""
# Construct via the legacy top-level import to mimic existing user code.
cfg = BenchRunCfg(cache_results=True, cache_samples=True)
# Legacy flat attributes must still be present.
self.assertTrue(cfg.cache_results)
self.assertTrue(cfg.cache_samples)
# New grouped cache configuration should be kept in sync.
self.assertTrue(cfg.cache.cache_results)
self.assertTrue(cfg.cache.cache_samples)
# The values exposed via legacy and grouped attributes must match.
self.assertEqual(cfg.cache_results, cfg.cache.cache_results)
self.assertEqual(cfg.cache_samples, cfg.cache.cache_samples)
```
</issue_to_address>
### Comment 9
<location> `test/test_bench_cfg.py:18` </location>
<code_context>
+)
+
+
+class TestSubConfigClasses(unittest.TestCase):
+ """Test individual sub-configuration classes."""
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests that verify sub-config defaults interact correctly when wired through `BenchRunCfg`.
Current tests cover each sub-config’s defaults and `TestBenchRunCfgComposition` covers delegation, but nothing verifies that a new `BenchRunCfg()` exposes those defaults correctly via its flat API (e.g. `BenchRunCfg().cache_results` is `False`, `BenchRunCfg().over_time` is `False`, etc.). Please add a test that instantiates `BenchRunCfg()` and asserts a representative subset of these flat attributes matches the corresponding sub-config defaults to guard against wiring regressions.
</issue_to_address>
### Comment 10
<location> `bencher/bench_cfg/run_cfg.py:18` </location>
<code_context>
+from bencher.bench_cfg.time_cfg import TimeCfg
+
+# Sub-config classes mapped to their attribute names
+_SUBCONFIG_CLASSES = {
+ "server": BenchPlotSrvCfg,
+ "cache": CacheCfg,
</code_context>
<issue_to_address>
**issue (complexity):** Consider replacing the reflection-built delegation map with a fixed, explicitly listed mapping of supported legacy flat parameters to sub-configs to simplify and clarify the API surface.
You can drop the reflection-based delegation and make the mapping explicit while keeping all current behavior (legacy flat kwargs, delegation, ctor routing).
Instead of `_SUBCONFIG_CLASSES`, `_PARAM_INTERNALS`, and `_build_delegation_map()`, define a static map of the legacy flat names you actually want to support:
```python
# Explicit, controlled mapping of legacy flat names → sub-config attribute
_DELEGATION_MAP = {
# cache
"cache_results": "cache",
"only_plot": "cache",
# execution
"repeats": "execution",
# display
"some_display_param": "display",
# visualization
"some_vis_param": "visualization",
# time
"time_event": "time",
# server
"port": "server",
# ...add only the legacy names you intentionally support
}
```
Then you can remove:
```python
_SUBCONFIG_CLASSES = { ... }
_PARAM_INTERNALS = {"name"}
def _build_delegation_map():
...
_DELEGATION_MAP = _build_delegation_map()
```
All existing logic in `__init__`, `__getattr__`, and `__setattr__` continues to work unchanged, but:
- Adding a new param to a sub-config no longer silently creates a new top-level attribute on `BenchRunCfg`.
- The delegated surface API is visible and grep-able in one place.
- Coupling is local to `BenchRunCfg` instead of driven by external `param` definitions.
If you want to make the intent even clearer without changing behavior, you can also add a short comment at the map definition pointing out that it defines the legacy flat API:
```python
# Legacy flat API maintained for backward compatibility.
# New code should prefer cfg.<subconfig>.<param> access.
_DELEGATION_MAP = { ... }
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
bencher/bench_cfg/time_cfg.py
Outdated
| run_date: datetime = param.Date( | ||
| default=datetime.now(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): run_date default is evaluated at import time, not at instance creation.
Because default=datetime.now() is evaluated at import time, all TimeCfg instances in a process will share the same run_date, which is likely incorrect. To capture per-instance creation time, use a callable default (e.g. param.Date(default=datetime.now) if supported) or set run_date in __init__ when it’s not provided. Otherwise this field will be inaccurate for long‑running processes.
bencher/bench_cfg/bench_cfg_class.py
Outdated
| raise_duplicate_exception: bool = param.Boolean( | ||
| False, doc="Use this while debugging if filename generation is unique" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (bug_risk): Duplicate raise_duplicate_exception parameter name may conflict with the visualization config.
Because BenchCfg inherits from BenchRunCfg and delegates sub-config parameters, defining raise_duplicate_exception here may clash with VisualizationCfg.raise_duplicate_exception, affecting how cfg.raise_duplicate_exception is resolved vs cfg.visualization.raise_duplicate_exception. Consider either relying on the visualization parameter or renaming this one to avoid ambiguity and accidental shadowing.
test/test_bench_cfg.py
Outdated
| cfg = BenchCfg( | ||
| input_vars=[], | ||
| result_vars=[], | ||
| const_vars=[], | ||
| bench_name="test", | ||
| title="test", | ||
| over_time=True, | ||
| ) | ||
| self.assertTrue(cfg.over_time) | ||
|
|
||
| def test_hash_persistent_varies_with_repeats(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Extend hash_persistent tests to cover non-empty vars and const_vars so hashing of configuration content is validated.
This test currently only exercises how repeats affects hash_persistent, with input_vars, result_vars, and const_vars all empty. Since the implementation also hashes these collections, please add a test using simple stub variable objects (with a known hash_persistent() and .name) to verify that changes to inputs, constants, or results also change the hash. That will validate the new hashing is content-sensitive, not just repeat-sensitive.
| cfg = BenchCfg( | |
| input_vars=[], | |
| result_vars=[], | |
| const_vars=[], | |
| bench_name="test", | |
| title="test", | |
| over_time=True, | |
| ) | |
| self.assertTrue(cfg.over_time) | |
| def test_hash_persistent_varies_with_repeats(self): | |
| cfg = BenchCfg( | |
| input_vars=[], | |
| result_vars=[], | |
| const_vars=[], | |
| bench_name="test", | |
| title="test", | |
| over_time=True, | |
| ) | |
| self.assertTrue(cfg.over_time) | |
| class _StubVar: | |
| def __init__(self, name, persistent_hash): | |
| self.name = name | |
| self._persistent_hash = persistent_hash | |
| def hash_persistent(self): | |
| return self._persistent_hash | |
| def test_hash_persistent_varies_with_vars_and_const_vars(self): | |
| """hash_persistent changes when input/result/const vars differ.""" | |
| base_input = [self._StubVar("in", "in-1")] | |
| base_const = [self._StubVar("const", "const-1")] | |
| base_result = [self._StubVar("res", "res-1")] | |
| cfg_base = BenchCfg( | |
| input_vars=base_input, | |
| result_vars=base_result, | |
| const_vars=base_const, | |
| bench_name="test", | |
| title="test", | |
| repeats=1, | |
| ) | |
| cfg_input_changed = BenchCfg( | |
| input_vars=[self._StubVar("in", "in-2")], | |
| result_vars=base_result, | |
| const_vars=base_const, | |
| bench_name="test", | |
| title="test", | |
| repeats=1, | |
| ) | |
| cfg_const_changed = BenchCfg( | |
| input_vars=base_input, | |
| result_vars=base_result, | |
| const_vars=[self._StubVar("const", "const-2")], | |
| bench_name="test", | |
| title="test", | |
| repeats=1, | |
| ) | |
| cfg_result_changed = BenchCfg( | |
| input_vars=base_input, | |
| result_vars=[self._StubVar("res", "res-2")], | |
| const_vars=base_const, | |
| bench_name="test", | |
| title="test", | |
| repeats=1, | |
| ) | |
| base_hash = cfg_base.hash_persistent() | |
| self.assertNotEqual(base_hash, cfg_input_changed.hash_persistent()) | |
| self.assertNotEqual(base_hash, cfg_const_changed.hash_persistent()) | |
| self.assertNotEqual(base_hash, cfg_result_changed.hash_persistent()) | |
| def test_hash_persistent_varies_with_repeats(self): |
bencher/bench_cfg/run_cfg.py
Outdated
| from bencher.bench_cfg.time_cfg import TimeCfg | ||
|
|
||
| # Sub-config classes mapped to their attribute names | ||
| _SUBCONFIG_CLASSES = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider replacing the reflection-built delegation map with a fixed, explicitly listed mapping of supported legacy flat parameters to sub-configs to simplify and clarify the API surface.
You can drop the reflection-based delegation and make the mapping explicit while keeping all current behavior (legacy flat kwargs, delegation, ctor routing).
Instead of _SUBCONFIG_CLASSES, _PARAM_INTERNALS, and _build_delegation_map(), define a static map of the legacy flat names you actually want to support:
# Explicit, controlled mapping of legacy flat names → sub-config attribute
_DELEGATION_MAP = {
# cache
"cache_results": "cache",
"only_plot": "cache",
# execution
"repeats": "execution",
# display
"some_display_param": "display",
# visualization
"some_vis_param": "visualization",
# time
"time_event": "time",
# server
"port": "server",
# ...add only the legacy names you intentionally support
}Then you can remove:
_SUBCONFIG_CLASSES = { ... }
_PARAM_INTERNALS = {"name"}
def _build_delegation_map():
...
_DELEGATION_MAP = _build_delegation_map()All existing logic in __init__, __getattr__, and __setattr__ continues to work unchanged, but:
- Adding a new param to a sub-config no longer silently creates a new top-level attribute on
BenchRunCfg. - The delegated surface API is visible and grep-able in one place.
- Coupling is local to
BenchRunCfginstead of driven by externalparamdefinitions.
If you want to make the intent even clearer without changing behavior, you can also add a short comment at the map definition pointing out that it defines the legacy flat API:
# Legacy flat API maintained for backward compatibility.
# New code should prefer cfg.<subconfig>.<param> access.
_DELEGATION_MAP = { ... }
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #689 +/- ##
=======================================
Coverage 89.39% 89.39%
=======================================
Files 91 91
Lines 4432 4432
=======================================
Hits 3962 3962
Misses 470 470 🚀 New features to boost your workflow:
|
|
Are these new flaky test or old tests? The existing tests are not flaky so if this is increasing the deadlines on old tests this indicates a problem with the implementation. (Sorry on phone so difficult to see the trace of the test heritage) |
I'm fairly sure they're existing flaky tests. |
|
Flaky on ci or your machine? I have ironed out all flaky tests a long time ago |
|
My meaning being, if they have become flaky now, it indicates a performance regression |
|
Ah, interesting. Right, so CI runs on Linux (platforms = ["linux-64"] in pyproject.toml) Interesting though. Could be worth increasing the deadlines anyway on the tests then and/or testing on different platforms? |
|
Ah ok, that doesn't sound like a performance regression after all then, and yes could add Mac to test matrix |
f1ffbb9 to
a63e6af
Compare
|
Just updated - was based off the other PR. This was meant to be a standalone test fix |
|
Yeah, but given it was based on the previous pr which changed cache/hash behaviour I wondered if those changes were causing performance regressions. It sounds like that is not the case though and is related to platform differences. |
Summary
Summary by Sourcery
Introduce a structured benchmark configuration system with composable sub-config classes and update tests and configuration access accordingly.
Bug Fixes:
Enhancements:
Tests: