All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
strategyparameter removed fromestimate_propensity_pairwise(): Callers usingstrategy=must switch tomethod=. Thestrategyparameter was previously validated but had no effect on behavior.
estimate_propensity_pairwise()defaultmethodis now"auto": Previously defaulted to"condlogit", which emitted a warning on every call in environments without SciPy."auto"selectscondlogitwhen SciPy is available and falls back tomultinomialsilently.estimate_propensity_pairwise()parameters afterdesignare now keyword-only: All parameters exceptdesignmust be passed as keyword arguments. This enforces clarity after the signature change and prevents silent positional-argument misuse.evaluate_pairwise_models()propensityparameter now accepts"auto": Type hint expanded fromLiteral["condlogit", "multinomial"]toLiteral["auto", "condlogit", "multinomial"].
- Documentation: Fixed broken pairwise quick start example in README — corrected function signature (
strategyreplaces non-existentautoscale_strategies), added required positional args (metric_col,task_type,direction), fixed return-value unpacking ((report, detailed)tuple), and added explicit model fitting before evaluation. - evaluate_sklearn_models: Raise
ValueErrorimmediately whenmodelsis empty or containsNonevalues, preventing silent(0, 0)DataFrames and obscure downstream errors (#45)
- Major Tech Debt: Implemented proper bootstrap confidence intervals using the existing
block_bootstrap_cifunction - Statistical Accuracy: Replaced incorrect normal approximation with proper moving-block bootstrap for time-series data
- Dead Code Elimination: The
block_bootstrap_cifunction was implemented but never used (0% test coverage) - False Advertising: README claimed "Bootstrap Confidence Intervals" but only used normal approximation
- Comprehensive Test Suite: Added 14 new tests for
block_bootstrap_cifunction achieving 100% coverage - Integration Tests: Added 7 integration tests for bootstrap CI in both evaluation functions
- Proper Error Handling: Added parameter validation and graceful fallback mechanisms
- Enhanced Documentation: Updated README with detailed bootstrap CI parameters and usage
- Statistical Rigor: Bootstrap CIs now properly account for time-series correlation structure
- Data Science Best Practices: Moving-block bootstrap is the gold standard for time-series confidence intervals
- Test Coverage: Improved overall test coverage from 75% to 78%
- Code Quality: Eliminated dead code and improved maintainability
- Bootstrap Method: Uses moving-block bootstrap with configurable block length (default: sqrt(n))
- Fallback Strategy: Gracefully falls back to normal approximation if bootstrap fails
- Performance: 400 bootstrap samples by default with configurable parameters
- Reproducibility: All bootstrap operations use consistent random seeds
- Release Automation: Configured GitHub Actions workflow to trigger on pushed tags
- Automatic PyPI Publishing: Releases now automatically publish to PyPI when tags are pushed
- Developer Experience: Simplified release process - just push a tag to trigger publication
- Workflow Trigger: Added
push.tags: ['v*']trigger to release workflow - Backward Compatibility: Maintains existing
releaseandworkflow_dispatchtriggers - Tag Pattern: Supports any version tag pattern (v1.0.0, v2.1.3, etc.)
- Comprehensive Type Safety: Enhanced type annotations throughout the codebase
- Sklearn Protocol Definitions: Added
ClassifierProtocolandRegressorProtocolfor better type safety - Runtime Validation: Added validation for callable estimators to prevent runtime errors
- Enhanced Error Handling: Improved error messages and warnings for better debugging
- Major Tech Debt Resolution: Resolved critical type safety violations using
Anytypes as workarounds - Type Annotation Issues: Fixed
induce_policy_from_sklearnreturn type fromAnytonp.ndarray - Mypy Compatibility: Resolved all mypy type checking errors with proper type annotations
- Callable Estimator Safety: Added runtime validation for callable estimators to prevent type errors
- Import Ordering: Fixed ruff linting issues with proper import organization
- Type Safety: Comprehensive protocols for sklearn estimators with proper method signatures
- Developer Experience: Better IDE support and static analysis capabilities
- Error Prevention: Runtime validation prevents common type-related runtime errors
- Code Quality: All linting, formatting, and type checking standards now pass
- Protocols: Added
ClassifierProtocolwithpredict_probamethod for sklearn classifiers - Validation: Runtime checks ensure callable estimators return compatible objects
- Type Inference: Improved type inference with explicit type annotations
- Compatibility: Maintains full backward compatibility while improving type safety
- Type Safety Violations: Resolved critical type annotation issues in core functions
- induce_policy_from_sklearn: Fixed return type annotation from
Anytonp.ndarray - EstimatorProtocol: Added proper protocol for sklearn estimators in
_get_outcome_estimator - estimate_propensity_pairwise: Fixed parameter type from
AnytoPairwiseDesign - Type Safety Workarounds: Removed mypy workarounds that compromised type safety
- Major Tech Debt Resolution: Addressed critical type safety violations that were using
Anytypes to avoid mypy issues - Code Quality: Improved type safety and maintainability by using proper type annotations
- Developer Experience: Enhanced IDE support and static analysis capabilities
- Release Pipeline: Minor release pipeline fixes and dependency updates
- CI Build Failure: Added missing
setuptools_scmdependency to release workflow - Release Pipeline: Fixed ModuleNotFoundError that prevented v0.3.0 from building successfully
- Package Publication: Ensured dynamic versioning works correctly in GitHub Actions
- Release Workflow: Enhanced build dependencies to include all required packages for successful builds
- State-of-the-Art (SOTA) Development Guidelines optimized for AI agents and human developers
- Comprehensive
DEVELOPMENT.mdwith 400+ lines of AI agent-friendly development workflows - Automated validation script (
scripts/validate_contribution.py) for contribution quality assurance - Error Prevention Strategy with comprehensive documentation and prevention mechanisms
- Branch protection setup guide (
.github/BRANCH_PROTECTION_SETUP.md) for maintainers - Enhanced
Makefilewithvalidatecommand for comprehensive contribution checking - CI-strict validation that mirrors GitHub Actions behavior exactly
- CONTRIBUTING.md with branch protection requirements and mandatory PR process
- Validation script with centralized configuration, AST-based docstring detection, and current branch display
- Pre-commit hooks integration with automated quality checks
- Zero tolerance for CI failures policy with preventive measures
- Comprehensive quality gates: linting, formatting, type checking, testing (80% coverage minimum)
- Git Flow branching strategy with protected main and develop branches
- Conventional commit message format requirements
- AI agent-specific guidelines with step-by-step workflows and troubleshooting
- Enterprise-grade development practices ensuring code movement via PRs with approvals
- Import order issues (PLC0415) in test files
- Code formatting consistency across all source directories
- Validation script encoding issues and Path usage improvements
- Pairwise evaluation system with comprehensive autoscaling strategies
- New
PairwiseDesignclass for pairwise comparison experiments - Multiple autoscaling algorithms:
uniform,proportional,sqrt,log,inverse - Choice modeling functionality with propensity score estimation
- Comprehensive test suite for pairwise evaluation features
- Example notebook demonstrating pairwise evaluation usage
- Resolved all mypy type annotation errors across codebase
- Fixed type incompatibilities between pandas and numpy types
- Improved type safety with proper conversions and annotations
- Enhanced pre-commit hooks configuration and installation
- Updated development workflow documentation
- Improved GitHub templates and CI workflows
- Professional development workflow with
developbranch - Comprehensive contributing guidelines (
CONTRIBUTING.md) - Pull request and issue templates (bug report, feature request)
- Development Makefile with common tasks (check, lint, test, build, etc.)
- Comprehensive DEVELOPMENT.md guide for contributors
- Updated pre-commit configuration with latest tool versions
- CI workflow now runs on both
mainanddevelopbranches - CI workflow accepts PRs to both
mainanddevelopbranches
- Updated deprecated GitHub Actions to latest versions
- Resolved mypy type annotation issues for Python 3.9 compatibility
- Applied comprehensive ruff formatting to all source files
- Permanently excluded auto-generated
_version.pyfrom ruff checks
- Fixed ruff configuration error in
.ruff.toml(movedline-lengthto top-level) - Resolved 257+ linting issues across the codebase
- Updated GitHub Actions workflows to use latest action versions
- Fixed deprecated
actions/upload-artifact@v3tov4 - Fixed deprecated
actions/setup-python@v4tov5
- Updated type annotations to modern syntax (
dict/tupleinstead ofDict/Tuple) - Applied comprehensive code style improvements
- All ruff quality checks now pass
- Comprehensive v0.1.1 release notes
- Manual PyPI upload process documentation
0.1.0 - 2025-01-12
- Initial release of skdr-eval library
- Core implementation of Doubly Robust (DR) and Stabilized Doubly Robust (SNDR) estimators
- Time-aware cross-validation with proper timestamp sorting for offline policy evaluation
- Synthetic data generation for testing and evaluation (
make_synth_logs) - Design matrix construction with context and action features (
build_design) - Propensity score fitting with time-aware calibration (
fit_propensity_timecal) - Outcome model fitting with cross-validation (
fit_outcome_crossfit) - Policy induction from sklearn models (
induce_policy_from_sklearn) - Bootstrap confidence intervals with moving-block bootstrap for time-series data
- Comprehensive evaluation function for sklearn models (
evaluate_sklearn_models) - Complete test suite with 17 tests covering all major functionality
- CI/CD workflows for automated testing and building
- Comprehensive documentation with examples and API reference
- Quickstart example demonstrating full evaluation workflow
- 🎯 Doubly Robust Estimation: Implements both DR and Stabilized DR (SNDR) estimators
- ⏰ Time-Aware Evaluation: Uses time-series splits and calibrated propensity scores
- 🔧 Sklearn Integration: Easy integration with scikit-learn models
- 📊 Comprehensive Diagnostics: ESS, match rates, propensity score analysis
- 🚀 Production Ready: Type-hinted, tested, and documented
- 📈 Bootstrap Confidence Intervals: Moving-block bootstrap for time-series data
- Supports Python 3.9+
- Dependencies: numpy, pandas, scikit-learn>=1.1
- Full type hints and comprehensive error handling
- 74% test coverage
- Follows modern Python packaging standards