All notable changes to XLens will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
-
AutoExec functionality for
problematic-translogs: Automated replica reset operations- New
--autoexecflag: Automatically executes replica reset operations without manual intervention - New
--dry-runflag: Safe simulation mode that shows what would be executed without making changes - New
--percentageparameter: Filter tables by threshold percentage (default: 200%) - New
--max-waitparameter: Configure timeout for retention lease monitoring (default: 720s) - New
--log-formatparameter: JSON logging support for container/Kubernetes environments - Robust state machine implementation: Handles set replicas → monitor leases → restore replicas workflow
- Intelligent retry logic: Exponential backoff with configurable timeouts
- Comprehensive error handling: Automatic rollback on failures with manual intervention guidance
- Production-ready: Designed for Kubernetes CronJob automation
- New
-
New
read-checkcommand: Professional cluster data readability monitor- Continuously monitors the 5 largest tables/partitions using max(_seq_no) to detect data changes
- Health status indicators: 🟢 Active, 🟡 Slow, 🔴 Stale tables
- Query performance tracking with ⚡ alerts for >1000ms queries
- Fresh connections with exponential backoff retry logic
- Enhanced statistics with per-table metrics on exit (CTRL+C)
- Automatic table discovery every 10 minutes with partition support
- Professional logging format:
timestamp: schema.table // _seq_no ±X // total_docs ±Y // XXXms - Anomaly detection with
⚠️ indicators for unusual activity patterns - Optimized
max(_seq_no)queries instead of expensive ORDER BY + LIMIT operations - Removed obsolete
--limitparameter (no longer needed with max() aggregation) - Uses CrateDB's actual query execution time instead of network RTT for performance metrics
-
Critical CrateDB response handling bug: Fixed AutoExec operations incorrectly reported as failures
- Issue: Successful ALTER TABLE operations were marked as "Failed: Unknown error"
- Root cause: Code checked for 'success' field that CrateDB doesn't return in HTTP responses
- Fix: Changed to check for absence of 'error' field instead of presence of 'success' field
- Impact: AutoExec operations now correctly report success/failure status
- Evidence: Operations were actually succeeding (retention leases changed) but reported as failed
-
Hardcoded threshold bug in AutoExec filtering: Fixed percentage calculations using wrong thresholds
- Issue: AutoExec percentage filtering used hardcoded 563MB for all tables instead of adaptive thresholds
- Fix: Modified
_filter_tables_by_percentage()to use actual table-specific adaptive thresholds - Impact: AutoExec now respects individual table configurations instead of using one-size-fits-all approach
-
Adaptive threshold override bug: Fixed --sizeMB parameter being ignored in favor of adaptive thresholds
- Issue: User-specified
--sizeMB 50was ignored, shards filtered using adaptive threshold (563.2MB) instead - Root cause:
_apply_adaptive_thresholds()compared translog sizes against adaptive threshold, not user's --sizeMB - Fix: Changed filtering to always respect
--sizeMBas minimum threshold; adaptive thresholds shown for reference only - Impact:
--sizeMBparameter now works as expected - shows all shards exceeding the specified size - Example: With
--sizeMB 50, shards at 542MB, 360MB, 301MB now correctly appear (previously: 0 shards shown)
- Issue: User-specified
-
Enhanced
problematic-translogscommand: Adaptive threshold detection based on table settings- Default
--sizeMBchanged from 300MB to 512MB (CrateDB default flush threshold) - Adaptive thresholds: Uses table-specific
flush_threshold_size * 1.1for informational display (does NOT override --sizeMB) --sizeMBparameter: Always respected as minimum threshold regardless of adaptive thresholds- Performance optimized: Only queries table settings for tables with initially problematic shards
- Enhanced display: Shows both configured value and calculated threshold (e.g., "2048MB/2253MB config/threshold")
- Partition support: Handles partition-specific flush_threshold_size settings
- Clean CLI: Simplified help text for better usability
- Fixed REROUTE CANCEL commands to include partition information for partitioned tables
- Default
-
Enhanced SQL logging: Complete transparency for AutoExec operations
- Dry-run mode: Shows "DRY RUN: Would execute: SQL" for all operations
- Regular mode: Shows "Executing: SQL" before actual database execution
- JSON logging: Uses loguru with structured data (consistent with read-check command)
- Rollback operations: Clear logging for failure recovery attempts
- Benefit: Full audit trail and debugging visibility for all database operations
-
Consistent loguru usage: Both
read-checkandproblematic-translogs --autoexecuse loguru for structured logging -
Enhanced per-table statistics: Shows document change tracking and performance metrics
- Document changes: Total change with min/avg/max deltas
- Performance: Query response time min/avg/max
- Anomaly counter per table
-
Query optimization:
read-checkuses efficientmax(_seq_no)aggregation instead of sorting -
Parameter cleanup: Removed obsolete
--limitparameter fromread-checkcommand -
Performance measurement:
read-checknow uses database execution time from CrateDB response instead of measuring network round-trip time
- Enhanced
loguru>=0.7.0usage: AutoExec now uses loguru for JSON logging (consistent with read-check command)
- Comprehensive AutoExec test coverage: 16 focused tests covering essential business scenarios
test_autoexec.py: Consolidated test suite covering replica reset workflows, dry-run safety, adaptive thresholds, CLI validation, and error handling- Test philosophy: Focus on business scenarios rather than implementation details
- Coverage includes: Regular and partitioned table resets, timeout handling, dry-run simulation, percentage-based filtering, partial/complete failure scenarios, CLI flag validation
- Reduced test complexity by 77.7% (from 1,998 → 445 lines) while maintaining comprehensive coverage
- Updated main README.md with
read-checkcommand reference and AutoExec functionality - Added comprehensive
docs/read-check.mdwith usage guide and examples - Updated
docs/README.mdwith data readability monitoring section - Added
SQL_LOGGING_AND_BUGFIX_SUMMARY.md: Technical documentation of bug fixes and enhancements - Created verification scripts:
verify_dry_run_safety.pyandverify_adaptive_thresholds.py
- [Unreleased]: Features ready but not yet in a tagged release
- [X.Y.Z]: Released versions with date
- Added: New features
- Changed: Changes to existing functionality
- Deprecated: Soon-to-be removed features
- Removed: Removed features
- Fixed: Bug fixes
- Security: Vulnerability fixes
This project will follow conventional commits for automatic changelog generation:
feat:for new features (minor version bump)fix:for bug fixes (patch version bump)docs:for documentation changesrefactor:for code refactoringtest:for adding testschore:for maintenance tasks
## [1.2.0] - 2024-02-15
### Added
- New feature X with capability Y
### Fixed
- Bug where Z caused unexpected behavior
### Changed
- Improved performance of command A