-
Notifications
You must be signed in to change notification settings - Fork 35
Add support for snapshots with zstd compression #1523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add support for snapshots with zstd compression #1523
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we update this PR / rename it to be oriented towards the customer problem solved. Seems like this one would be Add support for snapshots with zstd compression
?
Lets also see about adding test cases that exercise this behavior included as well. What do you think about adding a new index onto the existing EndToEnd test cases that enable this level of compression?
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
99b55d2
to
a3fb764
Compare
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
@@ -20,6 +20,7 @@ public class Elasticsearch816CodecFallback extends Codec { | |||
|
|||
public Elasticsearch816CodecFallback() { | |||
super("Elasticsearch816"); | |||
System.out.println(">>>>> Loading stub Elasticsearch816CodecFallback class"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove all these prints, we can have logger.debug if you want to retain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did I miss where we've supported zstd, I don't see any libraries or updated compression configuration to our lucene readers?
def run_test_benchmarks(self, cluster: Cluster): | ||
run_test_benchmarks(cluster=cluster) | ||
|
||
def disable_bloom(self, cluster: Cluster, index_name: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't indices with bloom supported with the IgnoreBloomFilter, why would we need to disable them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that we do not carry-forward this setting on OS versions during metadata migrate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I would say the metadata process should not move unsupported features from ES8 -> OpenSearch. This seems like its the case for this bloom feature, no?
@@ -28,6 +28,7 @@ public class IgnorePsmPostings extends PostingsFormat { | |||
|
|||
public IgnorePsmPostings() { | |||
super("ES812Postings"); | |||
System.out.println(">>>>> Loading stub IgnorePsmPostings class"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove all usage of System.out.println(..,)
, does it make sense to include debug level logging statements for these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will remove the usage of system.out.println and will use log4j wherever I want to opt in for temporary logging.
System.out.println(">>>>> Injecting missing mode BEST_SPEED into segment info"); | ||
si.putAttribute("Lucene90StoredFieldsFormat.mode", "BEST_SPEED"); | ||
} | ||
return new Lucene90StoredFieldsFormat(Mode.BEST_SPEED).fieldsReader(directory, si, fn, context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why choose BEST_SPEED? Can we move this into a static, this 'why' is a good comment to include on the field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not plan on continuing with this file entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and Hence it has not been referenced anywhere else.
@@ -29,6 +29,28 @@ | |||
"sonested": {"count": "1000"}, | |||
"nyc_taxis": {"count": "1000"} | |||
} | |||
empty_indices_no_taxi = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These do not appear to be referenced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will refine it, I included them only for testing purposes, to see a checkpoint if test passes without nyc_taxis. I agree many things are not clean in this PR, and will clean them one by one. Thank you for pointing them to my attention.
|
||
logger.info(f"Successfully disabled bloom filter on index: {index_name}. Response: {response}") | ||
|
||
def refresh(self, cluster: Cluster, index_name: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this, please update check_doc_counts_match
to use this refresh definition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
class Test0007EndToEndTestForES8WithOSBenchmarks(MATestBase): | ||
def __init__(self, console_config_path: str, console_link_env: Environment, unique_id: str): | ||
allow_combinations = [ | ||
(ElasticsearchV8_X, OpensearchV2_X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid creating a new test and add this matrix value onto an existing test case?
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
public StoredFieldsReader fieldsReader(Directory directory, SegmentInfo si, FieldInfos fn, IOContext context) throws IOException { | ||
System.out.println(">>> Attempting to decode stored fields using ZstdStoredFields814Format for segment: " + si.name); | ||
// TODO: Replace with real reader implementation | ||
return new Lucene90CompressingStoredFieldsFormat( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that importing custom-codecs implementation would be a better path that rolling our own implementation. Looks like Lucene912CustomStoredFieldsFormat might be the file we are looking for link.
Here it is on maven, https://mvnrepository.com/artifact/org.opensearch.plugin/opensearch-custom-codecs
Since we are using lucene 9, make sure to use their 2.19.2 artifact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jugal-chauhan Why aren't we importing the custom codecs?
…fieldsformat Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
c44b2f7
to
3b728c7
Compare
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
…s expected Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
@jugal-chauhan I'm not sure what the endpoint of this pull request is - can you update the description and describe what you want to accomplish by merging these changes? |
Description
This PR brings in the No-Op implement for ignoring Bloom Filters and to skip the Elasticsearch816 Codec required due to certain default index settings in ES 8x. It also focuses on adding in any new Codec into our Lucene9 Readers to successfully read ZSTD compressed lucene segments.
Issues Resolved
MIGRATIONS-2536
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.