Skip to content

Add comprehensive tests for kwargs validation across all functions#314

Open
Shlokpalrecha wants to merge 2 commits intoarviz-devs:mainfrom
Shlokpalrecha:fix-kwargs-validation
Open

Add comprehensive tests for kwargs validation across all functions#314
Shlokpalrecha wants to merge 2 commits intoarviz-devs:mainfrom
Shlokpalrecha:fix-kwargs-validation

Conversation

@Shlokpalrecha
Copy link

@Shlokpalrecha Shlokpalrecha commented Mar 3, 2026

What this PR does

Adds test coverage for kwargs validation across arviz-stats functions (relates to #142).

Background

While working on #142, I tested whether functions properly reject invalid kwargs. Turns out they all work correctly! But I noticed we only have tests for 4 functions (from PR #143), so I added comprehensive coverage.

My investigation process

I wrote a few scripts to systematically test functions with:

  • Random invalid kwargs like invalid_kwarg="test"
  • Common typos I've seen in practice (dims instead of dim, etc.)
  • Both direct function calls and the .azstats accessor

All 24+ functions I tested handle invalid kwargs properly by raising TypeError. Great!

What I added

tests/test_kwargs_validation.py - 26 tests organized by function category:

  • Sampling diagnostics: ess, rhat, mcse, bfmi, diagnose
  • Summary stats: summary, mean, median, mode, ci_in_rope
  • Visualization: qds, ecdf
  • LOO/model comparison: loo, compare
  • Metrics: bayesian_r2, metrics
  • Other functions: thin, weight_predictions, bayes_factor, psense
  • Accessor methods (6 tests for .azstats.* calls)

Testing

pytest tests/test_kwargs_validation.py -v    # 26/26 passed
pytest                                        # 2475 passed, 2 skipped  
tox -e check                                  # all checks passed

- Add 26 tests covering kwargs validation for all major user-facing functions
- Tests ensure functions properly raise TypeError for invalid keyword arguments
- Covers sampling diagnostics, summary statistics, LOO functions, metrics, and accessors
- Includes tests for common typos (e.g., 'dims' instead of 'dim')
- All tests pass successfully (26/26)
- Full test suite passes (2475 passed, 2 skipped)

This addresses issue arviz-devs#142 by adding comprehensive test coverage to ensure kwargs
validation continues to work properly and prevent regressions. While PR arviz-devs#143 initially
fixed the validation for hdi/eti/kde/histogram, investigation shows all functions
now properly validate kwargs, but test coverage was incomplete.

Related to arviz-devs#142
@read-the-docs-community
Copy link

read-the-docs-community bot commented Mar 3, 2026

Documentation build overview

📚 arviz-stats | 🛠️ Build #31831656 | 📁 Comparing d869747 against latest (0738be9)


🔍 Preview build

Show files changed (35 files in total): 📝 29 modified | ➕ 0 added | ➖ 6 deleted
File Status
genindex.html 📝 modified
_modules/index.html 📝 modified
api/index.html 📝 modified
_modules/arviz_stats/summary.html 📝 modified
api/generated/arviz_stats.base.array_stats.histogram.html 📝 modified
api/generated/arviz_stats.base.dataarray_stats.histogram.html 📝 modified
api/generated/arviz_stats.eti.html 📝 modified
api/generated/arviz_stats.hdi.html 📝 modified
api/generated/arviz_stats.histogram.html 📝 modified
api/generated/arviz_stats.iqr.html ➖ deleted
api/generated/arviz_stats.kde.html 📝 modified
api/generated/arviz_stats.loo_approximate_posterior.html 📝 modified
api/generated/arviz_stats.loo_expectations.html 📝 modified
api/generated/arviz_stats.loo_influence.html ➖ deleted
api/generated/arviz_stats.loo_kfold.html 📝 modified
api/generated/arviz_stats.loo_metrics.html 📝 modified
api/generated/arviz_stats.loo_moment_match.html 📝 modified
api/generated/arviz_stats.loo_score.html 📝 modified
api/generated/arviz_stats.loo_subsample.html 📝 modified
api/generated/arviz_stats.mad.html ➖ deleted
api/generated/arviz_stats.mean.html 📝 modified
api/generated/arviz_stats.median.html 📝 modified
api/generated/arviz_stats.mode.html 📝 modified
api/generated/arviz_stats.numba.array_stats.histogram.html 📝 modified
api/generated/arviz_stats.qds.html 📝 modified
api/generated/arviz_stats.residual_r2.html 📝 modified
api/generated/arviz_stats.std.html ➖ deleted
api/generated/arviz_stats.summary.html 📝 modified
api/generated/arviz_stats.var.html ➖ deleted
api/generated/arviz_stats.wasserstein.html 📝 modified
_modules/arviz_stats/loo/loo_approximate_posterior.html 📝 modified
_modules/arviz_stats/loo/loo_expectations.html 📝 modified
_modules/arviz_stats/loo/loo_influence.html ➖ deleted
_modules/arviz_stats/loo/loo_moment_match.html 📝 modified
_modules/arviz_stats/loo/loo_subsample.html 📝 modified

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this is a good start. I am starting with a few global comments and we can iterate from there. There might be some exceptions to the general comments I am mentioning but it will be easier to see those once the file is shorter and with less redundancy


# pylint: disable=redefined-outer-name, no-self-use, unexpected-keyword-arg
import pytest
from arviz_base import load_arviz_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take a look at the other test files that import arviz_base. We want arviz-stats to also be usable without arviz-base installed and we run tests in minimal environments so it needs some extra care. Ref: https://python.arviz.org/projects/stats/en/latest/contributing/testing.html#how-to-write-tests

Comment on lines +18 to +21
@pytest.fixture(scope="class")
def idata(self):
"""Load test data."""
return load_arviz_data("non_centered_eight")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no need to define this fixture, the ones in conftest.py are available from any test file. For these tests you can use datatree directly

Comment on lines +23 to +32
def test_ess_rejects_invalid_kwargs(self, idata):
"""Test that ess() raises TypeError for invalid kwargs."""
import arviz_stats as azs

with pytest.raises(TypeError, match=".*unexpected keyword argument.*"):
azs.ess(idata.posterior, invalid_kwarg="test")

# Also test common typos
with pytest.raises(TypeError, match=".*unexpected keyword argument.*"):
azs.ess(idata.posterior, methods="bulk") # should be 'method'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for testing functions a couple comments.


The first and main one is we want to test we don't silently ignore **kwargs. Therefore, these types of tests will only be relevant to functions that take **kwargs. There are many functions that do but not all of them do. ess in particular does not, so this test is testing basic builtin features of python functions. If we can't rely on that this test failing would be the least of our problems. Only functions that take **kwargs should get these kind of tests.


The second is about the test content and style so it is not relevant for ess, rhat, and other functions that do not take **kwargs but it is for all the ones who do. We ignore doc related link checks in tests because the test name and body is usually self explanatory, we don't see any value added by adding a docstring that is basically the test name with better grammar. arviz_stats is the library we are testing so it should be imported only once at the top of the file with the other imports, and the two pytest.raises are doing the same check so one is enough, when it comes to checking if invalid_kwarg or methods is an argument of the function the function is doing exactly the same, we'd add the two different ones if we are triggering a different error or triggering the same error through a different code branch.

After doing all of that, the tests can be parametrized by the function to test on. As you probably have seen in #143, we have a single test function that checks multiple accessors through pytest.mark.parametrize. Here the same should happen with the relevant top level functions which I would rope together even if their source code is defined in different source files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file should not be commited

Shlokpalrecha added a commit to Shlokpalrecha/arviz-stats that referenced this pull request Mar 5, 2026
Address review feedback from OriolAbril on PR arviz-devs#314:

- Only test accessor methods that accept **kwargs
- Use pytest.mark.parametrize to reduce test duplication
- Remove test docstrings (test names are self-explanatory)
- Use importorskip pattern for minimal environment support
- Use datatree fixture from conftest.py instead of redefining
- Remove INVESTIGATION_SUMMARY.md from repository
- Import numpy at module level instead of inside test functions
- Follow the same pattern as tests in PR arviz-devs#143

The parametrized test covers 20 accessor methods, with 3 additional
tests for methods that require positional arguments before **kwargs.

All 23 tests pass successfully.
@Shlokpalrecha
Copy link
Author

Thanks for the feedback! I've addressed all your points:

All 23 tests pass. Ready for another look.

@Shlokpalrecha Shlokpalrecha requested a review from OriolAbril March 7, 2026 04:07
Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that tests are parametrized and don't take as much space they should be moved (and combined/integrated with) the tests in base/test_stats.py. Otherwise we will be extremely confused in the future as to why the kde accessor is tested in one place but the ecdf one is tested somewhere else. Organization of tests here in arviz-stats is quite a disaster so we might also want to move them somewhere else which will be much easier if they are all in the same place

getattr(accessor, func)(invalid_kwarg="value")


def test_loo_score_kwargs_raise(datatree):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these bottom 3 can also be parametrized into one test function

),
)
def test_accessor_kwargs_raise(datatree, func):
accessor = datatree.posterior.ds.azstats
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
accessor = datatree.posterior.ds.azstats
accessor = datatree.posterior.dataset.azstats

minor nit, .ds is the older syntax. not fully deprecated but discouraged so it should not be used in new code


def test_loo_score_kwargs_raise(datatree):
accessor = datatree.posterior.ds.azstats
y_obs = xr.DataArray(np.random.randn(8))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
y_obs = xr.DataArray(np.random.randn(8))
rng = np.random.default_rng()
y_obs = xr.DataArray(rng.normal(size=8))

we should also use the numpy random Generators instead of RandomState classes (either explicitly or through the global np.random.<distribution>.

@codecov-commenter
Copy link

codecov-commenter commented Mar 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.16%. Comparing base (4383159) to head (142ae20).
⚠️ Report is 11 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #314      +/-   ##
==========================================
- Coverage   84.42%   84.16%   -0.26%     
==========================================
  Files          42       42              
  Lines        5829     5930     +101     
==========================================
+ Hits         4921     4991      +70     
- Misses        908      939      +31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Address review feedback from OriolAbril on PR arviz-devs#314:

- Only test accessor methods that accept **kwargs
- Use pytest.mark.parametrize to reduce test duplication
- Remove test docstrings (test names are self-explanatory)
- Use importorskip pattern for minimal environment support
- Use datatree fixture from conftest.py instead of redefining
- Remove INVESTIGATION_SUMMARY.md from repository
- Import numpy at module level instead of inside test functions
- Follow the same pattern as tests in PR arviz-devs#143

The parametrized test covers 20 accessor methods, with 3 additional
tests for methods that require positional arguments before **kwargs.

All 23 tests pass successfully.
@Shlokpalrecha Shlokpalrecha force-pushed the fix-kwargs-validation branch from 142ae20 to d869747 Compare March 16, 2026 19:16
@Shlokpalrecha
Copy link
Author

Moved everything into base/test_stats.py, parametrized the 3 loo tests into one, switched to .dataset and np.random.default_rng(). Thanks for the pointers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants