Add comprehensive tests for kwargs validation across all functions by Shlokpalrecha · Pull Request #314 · arviz-devs/arviz-stats

Shlokpalrecha · 2026-03-03T15:49:02Z

What this PR does

Adds test coverage for kwargs validation across arviz-stats functions (relates to #142).

Background

While working on #142, I tested whether functions properly reject invalid kwargs. Turns out they all work correctly! But I noticed we only have tests for 4 functions (from PR #143), so I added comprehensive coverage.

My investigation process

I wrote a few scripts to systematically test functions with:

Random invalid kwargs like invalid_kwarg="test"
Common typos I've seen in practice (dims instead of dim, etc.)
Both direct function calls and the .azstats accessor

All 24+ functions I tested handle invalid kwargs properly by raising TypeError. Great!

What I added

tests/test_kwargs_validation.py - 26 tests organized by function category:

Sampling diagnostics: ess, rhat, mcse, bfmi, diagnose
Summary stats: summary, mean, median, mode, ci_in_rope
Visualization: qds, ecdf
LOO/model comparison: loo, compare
Metrics: bayesian_r2, metrics
Other functions: thin, weight_predictions, bayes_factor, psense
Accessor methods (6 tests for .azstats.* calls)

Testing

pytest tests/test_kwargs_validation.py -v    # 26/26 passed
pytest                                        # 2475 passed, 2 skipped  
tox -e check                                  # all checks passed

- Add 26 tests covering kwargs validation for all major user-facing functions - Tests ensure functions properly raise TypeError for invalid keyword arguments - Covers sampling diagnostics, summary statistics, LOO functions, metrics, and accessors - Includes tests for common typos (e.g., 'dims' instead of 'dim') - All tests pass successfully (26/26) - Full test suite passes (2475 passed, 2 skipped) This addresses issue arviz-devs#142 by adding comprehensive test coverage to ensure kwargs validation continues to work properly and prevent regressions. While PR arviz-devs#143 initially fixed the validation for hdi/eti/kde/histogram, investigation shows all functions now properly validate kwargs, but test coverage was incomplete. Related to arviz-devs#142

read-the-docs-community · 2026-03-03T15:51:33Z

Documentation build overview

📚 arviz-stats | 🛠️ Build #31831656 | 📁 Comparing d869747 against latest (0738be9)

🔍 Preview build

Show files changed (35 files in total): 📝 29 modified | ➕ 0 added | ➖ 6 deleted

File	Status
genindex.html	📝 modified
_modules/index.html	📝 modified
api/index.html	📝 modified
_modules/arviz_stats/summary.html	📝 modified
api/generated/arviz_stats.base.array_stats.histogram.html	📝 modified
api/generated/arviz_stats.base.dataarray_stats.histogram.html	📝 modified
api/generated/arviz_stats.eti.html	📝 modified
api/generated/arviz_stats.hdi.html	📝 modified
api/generated/arviz_stats.histogram.html	📝 modified
api/generated/arviz_stats.iqr.html	➖ deleted
api/generated/arviz_stats.kde.html	📝 modified
api/generated/arviz_stats.loo_approximate_posterior.html	📝 modified
api/generated/arviz_stats.loo_expectations.html	📝 modified
api/generated/arviz_stats.loo_influence.html	➖ deleted
api/generated/arviz_stats.loo_kfold.html	📝 modified
api/generated/arviz_stats.loo_metrics.html	📝 modified
api/generated/arviz_stats.loo_moment_match.html	📝 modified
api/generated/arviz_stats.loo_score.html	📝 modified
api/generated/arviz_stats.loo_subsample.html	📝 modified
api/generated/arviz_stats.mad.html	➖ deleted
api/generated/arviz_stats.mean.html	📝 modified
api/generated/arviz_stats.median.html	📝 modified
api/generated/arviz_stats.mode.html	📝 modified
api/generated/arviz_stats.numba.array_stats.histogram.html	📝 modified
api/generated/arviz_stats.qds.html	📝 modified
api/generated/arviz_stats.residual_r2.html	📝 modified
api/generated/arviz_stats.std.html	➖ deleted
api/generated/arviz_stats.summary.html	📝 modified
api/generated/arviz_stats.var.html	➖ deleted
api/generated/arviz_stats.wasserstein.html	📝 modified
_modules/arviz_stats/loo/loo_approximate_posterior.html	📝 modified
_modules/arviz_stats/loo/loo_expectations.html	📝 modified
_modules/arviz_stats/loo/loo_influence.html	➖ deleted
_modules/arviz_stats/loo/loo_moment_match.html	📝 modified
_modules/arviz_stats/loo/loo_subsample.html	📝 modified

OriolAbril

Thanks this is a good start. I am starting with a few global comments and we can iterate from there. There might be some exceptions to the general comments I am mentioning but it will be easier to see those once the file is shorter and with less redundancy

OriolAbril · 2026-03-04T16:40:40Z

tests/test_kwargs_validation.py

+
+# pylint: disable=redefined-outer-name, no-self-use, unexpected-keyword-arg
+import pytest
+from arviz_base import load_arviz_data


take a look at the other test files that import arviz_base. We want arviz-stats to also be usable without arviz-base installed and we run tests in minimal environments so it needs some extra care. Ref: https://python.arviz.org/projects/stats/en/latest/contributing/testing.html#how-to-write-tests

OriolAbril · 2026-03-04T16:42:54Z

tests/test_kwargs_validation.py

+    @pytest.fixture(scope="class")
+    def idata(self):
+        """Load test data."""
+        return load_arviz_data("non_centered_eight")


there is no need to define this fixture, the ones in conftest.py are available from any test file. For these tests you can use datatree directly

OriolAbril · 2026-03-04T17:03:23Z

tests/test_kwargs_validation.py

+    def test_ess_rejects_invalid_kwargs(self, idata):
+        """Test that ess() raises TypeError for invalid kwargs."""
+        import arviz_stats as azs
+
+        with pytest.raises(TypeError, match=".*unexpected keyword argument.*"):
+            azs.ess(idata.posterior, invalid_kwarg="test")
+
+        # Also test common typos
+        with pytest.raises(TypeError, match=".*unexpected keyword argument.*"):
+            azs.ess(idata.posterior, methods="bulk")  # should be 'method'


for testing functions a couple comments.

The first and main one is we want to test we don't silently ignore **kwargs. Therefore, these types of tests will only be relevant to functions that take **kwargs. There are many functions that do but not all of them do. ess in particular does not, so this test is testing basic builtin features of python functions. If we can't rely on that this test failing would be the least of our problems. Only functions that take **kwargs should get these kind of tests.

The second is about the test content and style so it is not relevant for ess, rhat, and other functions that do not take **kwargs but it is for all the ones who do. We ignore doc related link checks in tests because the test name and body is usually self explanatory, we don't see any value added by adding a docstring that is basically the test name with better grammar. arviz_stats is the library we are testing so it should be imported only once at the top of the file with the other imports, and the two pytest.raises are doing the same check so one is enough, when it comes to checking if invalid_kwarg or methods is an argument of the function the function is doing exactly the same, we'd add the two different ones if we are triggering a different error or triggering the same error through a different code branch.

After doing all of that, the tests can be parametrized by the function to test on. As you probably have seen in #143, we have a single test function that checks multiple accessors through pytest.mark.parametrize. Here the same should happen with the relevant top level functions which I would rope together even if their source code is defined in different source files.

OriolAbril · 2026-03-04T17:04:25Z

INVESTIGATION_SUMMARY.md

this file should not be commited

Address review feedback from OriolAbril on PR arviz-devs#314: - Only test accessor methods that accept **kwargs - Use pytest.mark.parametrize to reduce test duplication - Remove test docstrings (test names are self-explanatory) - Use importorskip pattern for minimal environment support - Use datatree fixture from conftest.py instead of redefining - Remove INVESTIGATION_SUMMARY.md from repository - Import numpy at module level instead of inside test functions - Follow the same pattern as tests in PR arviz-devs#143 The parametrized test covers 20 accessor methods, with 3 additional tests for methods that require positional arguments before **kwargs. All 23 tests pass successfully.

Shlokpalrecha · 2026-03-05T16:23:39Z

Thanks for the feedback! I've addressed all your points:

Only testing accessor methods with **kwargs now
Using pytest.mark.parametrize for the 20 similar functions
Removed docstrings and INVESTIGATION_SUMMARY.md
Using importorskip and datatree fixture from conftest
Following the pattern from PR raise when extra kwargs present in hdi&kde, add test for eti,hdi,kde&histogram #143

All 23 tests pass. Ready for another look.

OriolAbril

now that tests are parametrized and don't take as much space they should be moved (and combined/integrated with) the tests in base/test_stats.py. Otherwise we will be extremely confused in the future as to why the kde accessor is tested in one place but the ecdf one is tested somewhere else. Organization of tests here in arviz-stats is quite a disaster so we might also want to move them somewhere else which will be much easier if they are all in the same place

OriolAbril · 2026-03-15T21:36:37Z

tests/test_kwargs_validation.py

+        getattr(accessor, func)(invalid_kwarg="value")
+
+
+def test_loo_score_kwargs_raise(datatree):


these bottom 3 can also be parametrized into one test function

OriolAbril · 2026-03-15T21:39:00Z

tests/test_kwargs_validation.py

+    ),
+)
+def test_accessor_kwargs_raise(datatree, func):
+    accessor = datatree.posterior.ds.azstats


Suggested change

accessor = datatree.posterior.ds.azstats

accessor = datatree.posterior.dataset.azstats

minor nit, .ds is the older syntax. not fully deprecated but discouraged so it should not be used in new code

OriolAbril · 2026-03-15T21:41:00Z

tests/test_kwargs_validation.py

+
+def test_loo_score_kwargs_raise(datatree):
+    accessor = datatree.posterior.ds.azstats
+    y_obs = xr.DataArray(np.random.randn(8))


Suggested change

y_obs = xr.DataArray(np.random.randn(8))

rng = np.random.default_rng()

y_obs = xr.DataArray(rng.normal(size=8))

we should also use the numpy random Generators instead of RandomState classes (either explicitly or through the global np.random.<distribution>.

codecov-commenter · 2026-03-15T21:50:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.16%. Comparing base (4383159) to head (142ae20).
⚠️ Report is 11 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #314      +/-   ##
==========================================
- Coverage   84.42%   84.16%   -0.26%     
==========================================
  Files          42       42              
  Lines        5829     5930     +101     
==========================================
+ Hits         4921     4991      +70     
- Misses        908      939      +31

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Address review feedback from OriolAbril on PR arviz-devs#314: - Only test accessor methods that accept **kwargs - Use pytest.mark.parametrize to reduce test duplication - Remove test docstrings (test names are self-explanatory) - Use importorskip pattern for minimal environment support - Use datatree fixture from conftest.py instead of redefining - Remove INVESTIGATION_SUMMARY.md from repository - Import numpy at module level instead of inside test functions - Follow the same pattern as tests in PR arviz-devs#143 The parametrized test covers 20 accessor methods, with 3 additional tests for methods that require positional arguments before **kwargs. All 23 tests pass successfully.

Shlokpalrecha · 2026-03-16T19:18:34Z

Moved everything into base/test_stats.py, parametrized the 3 loo tests into one, switched to .dataset and np.random.default_rng(). Thanks for the pointers!

OriolAbril reviewed Mar 4, 2026

View reviewed changes

Shlokpalrecha requested a review from OriolAbril March 7, 2026 04:07

OriolAbril reviewed Mar 15, 2026

View reviewed changes

Shlokpalrecha force-pushed the fix-kwargs-validation branch from 142ae20 to d869747 Compare March 16, 2026 19:16

Shlokpalrecha requested a review from OriolAbril March 16, 2026 19:18

		getattr(accessor, func)(invalid_kwarg="value")


		def test_loo_score_kwargs_raise(datatree):

	accessor = datatree.posterior.ds.azstats
	accessor = datatree.posterior.dataset.azstats

	y_obs = xr.DataArray(np.random.randn(8))
	rng = np.random.default_rng()
	y_obs = xr.DataArray(rng.normal(size=8))

Uh oh!

Conversation

Shlokpalrecha commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Background

My investigation process

What I added

Testing

Uh oh!

read-the-docs-community bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation build overview

Uh oh!

OriolAbril left a comment

Choose a reason for hiding this comment

Uh oh!

OriolAbril Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

OriolAbril Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

OriolAbril Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

OriolAbril Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Shlokpalrecha commented Mar 5, 2026

Uh oh!

OriolAbril left a comment

Choose a reason for hiding this comment

Uh oh!

OriolAbril Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

OriolAbril Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

OriolAbril Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Shlokpalrecha commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Shlokpalrecha commented Mar 3, 2026 •

edited

Loading

read-the-docs-community bot commented Mar 3, 2026 •

edited

Loading

codecov-commenter commented Mar 15, 2026 •

edited

Loading