-
Notifications
You must be signed in to change notification settings - Fork 637
Deepfreeze release #1792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
wortmanb
wants to merge
289
commits into
elastic:deepfreeze8
Choose a base branch
from
wortmanb:RC1
base: deepfreeze8
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Deepfreeze release #1792
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This wasn't working when tryingn to map with filters.
I added several new options and adjustd others so that we can now specify --rotate_by and choose bucket or path. Then the suffix gets appolied either to the bucket name or the path name, depending. The repo name will always get the suffix.
Switched most settings to being part of a Settings object. Completed updating Rotate up through ILM changes. Fully implemented style.
Verified and fixed code for removing old repositories.
For oneup, at least. Need to ensure this works for date-based rotation too.
Removed commented-out code now that I know it's safe
Finally got black configured and disabled Flake. Much happier now.
templated these, which we'll use to track repos and thawsets inside of the status index in elasticsearch
Unit tests for utility classes used by DeepFreeze.
These tests cover all remaining utility (module-level) functions. They could perhaps be collected into a single file.
I plan to do this wherever possible, and anywhere it doesn't cause more problems than it solves.
This is almost certainly incomplete, but I'll add to it as we go along.
This completely breaks a number of things, but I wanted to capture it mid-stream so as not to lose it. Flaky network at BAH.
Set defaults for this code formatter, which is faster than black but can format just as well and to the same standard.
Switched to Ruff. It really wants " instead of '.
Added s3client.py to encapsulate S3 client code for various providers under a consistent inteface. Includes classes S3Client and its implementation classes, plus a factory method to return a client object for a particular provider.
Also made some updates to deepfreeze.py to comply with testing better.
Allows us to persist more details about the repo.
We now create and assign a new "frozen-only" ilm policy to each thawed index, based on the repository it was thawed from. This prevents all thawed indices from showing up on Index Management as having lifecycle errors.
Users can still list all by adding --include-copmleted or -c
1. Added Status Constants (constants.py) - Added THAW_STATUS_IN_PROGRESS, THAW_STATUS_COMPLETED, THAW_STATUS_FAILED, and THAW_STATUS_REFROZEN constants - Created THAW_REQUEST_STATUSES list for validation 2. Updated Refreeze Action (refreeze.py) - Changed status from "completed" to THAW_STATUS_REFROZEN when refreeze completes - Now properly indicates that thawed data has been cleaned up and returned to frozen state 3. Added Retention Setting (helpers.py) - Added thaw_request_retention_days_refrozen setting (default: 35 days) - This aligns with the 30-day max for data to return to Glacier, plus 5 days buffer 4. Updated Cleanup Logic (cleanup.py) - Added handling for "refrozen" status in both _cleanup_old_thaw_requests() and dry-run mode - Refrozen requests are automatically deleted after 35 days 5. Updated Thaw List Filtering (thaw.py - do_list_requests()) - Now excludes both "completed" AND "refrozen" requests by default - Use --include-completed or -c flag to see all requests - Updated help messages to reflect "completed/refrozen" filtering 6. Updated Status Checking (thaw.py) - do_check_status(): Skips refrozen requests with helpful message - do_check_all_status(): Filters out refrozen requests before processing Status Lifecycle The complete thaw request lifecycle is now: 1. in_progress → Thaw operation is actively running 2. completed → Thaw succeeded, data is available and mounted 3. refrozen → Data has been cleaned up via refreeze (new!) 4. failed → Thaw operation failed Retention Periods (Cleanup) - Completed: 7 days (default) - Failed: 30 days (default) - Refrozen: 35 days (new!) All syntax validation passed! The new status properly distinguishes between "thaw completed and data available" vs "thaw was completed but has been cleaned up."
Added descriptions of all actions in markdown.
Due to issues in rotate, not all repos were being marked 'frozen'. This necessitated adding repair_metadata, which can be used should this ever occur again and serves as a foundation for other potential repair work in the future. Updated integration tests and fixes revealed by testing.
1. Parallelized AWS S3 API Calls (10-15x speedup on S3 checks) File: curator/actions/deepfreeze/utilities.py - Modified check_restore_status() to use ThreadPoolExecutor with 15 concurrent workers - Instead of checking objects sequentially (one by one), now checks up to 15 objects in parallel - This is the biggest win - transforms sequential 10,000 API calls from 16+ minutes to ~1 minute Technical details: - boto3 client is thread-safe, making this safe to implement - Separates instant-access objects (no check needed) from Glacier objects (need parallel checking) - Uses concurrent.futures.as_completed() to process results as they arrive 2. Eliminated Redundant Status Checks (2x speedup on overall flow) Files: curator/actions/deepfreeze/thaw.py - Added status caching in both do_check_status() and do_check_all_status() - Modified _display_thaw_status() to accept optional cache parameter - Previously called check_restore_status() twice per repository (once for logic, once for display) - Now caches results from first check and reuses for display 3. Added Progress Indicators (UX improvement) Files: curator/actions/deepfreeze/thaw.py - Shows "Checking repository X of Y..." as each repository is processed - Gives users real-time feedback instead of appearing frozen - Uses existing rich library for clean terminal output 4. Code Quality - All changes pass black formatting - All changes pass ruff linting - Backward compatible - no API changes Expected Performance Improvement Before: ~11 minutes (660 seconds) After: ~1-2 minutes (60-120 seconds) Overall speedup: 5-10x faster! Breakdown: - S3 API calls: 16 minutes → ~1 minute (15x faster) - Redundant checks eliminated: Cut remaining time in half - Total: 11 minutes → 1-2 minutes The exact improvement depends on: - Number of thaw requests - Number of repositories per request - Number of objects per repository - Network latency to AWS S3
Summary of Changes 1. CLI Command (curator/cli_singletons/deepfreeze.py:344-370) Added the -f/--refrozen-retention-days option to the cleanup command: - Short flag: -f (mnemonic for "refrozen") - Long flag: --refrozen-retention-days - Type: integer - Default: None (uses config setting, typically 35 days) 2. Cleanup Action (curator/actions/deepfreeze/cleanup.py) - Updated __init__ to accept refrozen_retention_days parameter - Modified _cleanup_old_thaw_requests() to use CLI override if provided, otherwise fall back to settings value - Applied same logic to do_dry_run() method for consistent behavior - Updated class docstring to document the new parameter 3. Schema Validation Added validation in two places: - option_defaults.py: Created refrozen_retention_days() function with validation (1-365 days range, None allowed) - validators/options.py: Added the option to cleanup's validation schema
1. Added NotFoundError import (line 7) - imported the specific exception
type from elasticsearch8
to handle repository not found errors
2. Added specific exception handling (lines 210-223) - added a new
exception handler that:
- Specifically catches NotFoundError before the generic exception
handler
- Detects when the error is a repository_missing_exception
(indicating the repository has
already been unmounted)
- Logs an INFO level message instead of ERROR: "Repository {name}
has already been unmounted,
no indices to delete"
- Returns gracefully with no indices deleted
- For other NotFoundError cases, logs a WARNING instead of ERROR
Show counts in thaw list output
Detect and fix situation where a thaw request is submitted, acted upon by AWS, but ignored by the requestor. If check-status is run after the data is refrozen by AWS, this detects that and fixes the metadata to show the request as being refrozen so it doesn't languish as a pending request.
Updated test description to reflect the integration tests' unreliable nature.
…er guide - Add detailed overview and architecture documentation - Document all actions: setup, rotate, status, thaw, refreeze, cleanup, repair-metadata - Include quick start guide and common workflows - Add cost optimization and scheduling recommendations - Document ILM integration and troubleshooting 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As tested as I can make it, here's the first releasable version of deepfreeze for curator. Unit tests all pass; integration tests are a work in progress as they take so long to run that it's really difficult, and parallelizing them hasn't worked very well either.