Skip to content

Add text field support to archive indices #86591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
May 18, 2022

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented May 10, 2022

Adds support for "text" fields in archive indices, with the goal of adding simple filtering support on text fields when querying archive indices.

There are some differences to regular text fields:

  • no global statistics: queries on text fields return constant score (similar to match_only_text).
  • analyzer fields can be updated
  • if defined analyzer is not available, falls back to default analyzer
  • no guarantees that analyzers are BWC

The above limitations also give us the flexibility to eventually swap out the implementation with a "runtime-text field" variant, and hence only provide those capabilities that can be emulated via a runtime field.

Relates #81210

@@ -92,7 +89,7 @@ public IndexMetadata verifyIndexMetadata(IndexMetadata indexMetadata, Version mi
// Next we have to run this otherwise if we try to create IndexSettings
// with broken settings it would fail in checkMappingsCompatibility
newMetadata = archiveBrokenIndexSettings(newMetadata);
createAndValidateMapping(newMetadata);
checkMappingsCompatibility(newMetadata);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this class revert the changes made in #85059, as we now validate against a MapperService with all analyzers configured when restoring the legacy index in RestoreService. The reason for doing it this way now is that it provides better error messages on restore, but also handles a tricky situation where the Mapping returned by these methods here would have their analyzer settings misconfigured as checkMappingsCompatibility would not create a proper environment with actual analyzers configured.

import static org.apache.lucene.search.MultiTermQuery.CONSTANT_SCORE_REWRITE;
import static org.hamcrest.Matchers.equalTo;

public class ConstantScoreTextFieldTypeTests extends FieldTypeTestCase {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contains all the tests of TextFieldTypeTests with some adaptations for constant scoring.

@@ -227,7 +227,7 @@ public void testUnmappedLegacyFieldsUnderKnownRootField() throws Exception {
b.startObject("name");
b.field("type", "keyword");
b.startObject("fields");
b.startObject("subfield").field("type", "text").endObject();
b.startObject("subfield").field("type", CompletionFieldMapper.CONTENT_TYPE).endObject();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that we support text fields, we can't use it anymore to check "placeholder" functionality. Instead we use another unsupported field.

@ywelsch ywelsch mentioned this pull request May 10, 2022
32 tasks
@ywelsch ywelsch requested a review from romseygeek May 10, 2022 09:27
@ywelsch ywelsch added >non-issue :Search/Search Search-related issues that do not fall into other categories labels May 10, 2022
@ywelsch ywelsch marked this pull request as ready for review May 10, 2022 09:27
@elasticmachine elasticmachine added the Team:Search Meta label for search team label May 10, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ywelsch! This looks pretty close - I left a couple of questions.

// Disable scoring
return new ConstantScoreQuery(super.phrasePrefixQuery(stream, slop, maxExpansions, queryShardContext));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Span prefix queries will do something weird here, won't they? But then span prefix queries are kind of weird in any case. I think they're enough of an edge case that we can override spanPrefixQuery() and throw an exception saying we don't support them on legacy indexes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we don't need to support them. I've pushed 604d70c but I'm not sure how to add a test for it (I couldn't find existing tests that exercise this method).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks as though the two places it would be used are 1) a prefix query wrapped in a span multiterm query, which can be replaced by an interval query; and 2) a match_phrase_prefix query that uses multiterm synonyms, which will get factored away when we rework things to use QueryBuilders properly. So I think we can happily just throw an exception here and not worry about it further :)

assertFalse(ft.isAggregatable());
ft.setFielddata(true);
assertTrue(ft.isAggregatable());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we support fielddata on older indexes or will that run into the same issues with global stats? I think we probably need to either explicitly disable it and throw an error when it's accessed or have a test for a significant terms agg against a legacy text field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling fielddata on older indexes is ok I think (the main purpose, as stated in the PR description, is to provide basic filtering support on archive indices). I've addressed this in 56d391c

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @ywelsch

// Disable scoring
return new ConstantScoreQuery(super.phrasePrefixQuery(stream, slop, maxExpansions, queryShardContext));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks as though the two places it would be used are 1) a prefix query wrapped in a span multiterm query, which can be replaced by an interval query; and 2) a match_phrase_prefix query that uses multiterm synonyms, which will get factored away when we rework things to use QueryBuilders properly. So I think we can happily just throw an exception here and not worry about it further :)

@ywelsch
Copy link
Contributor Author

ywelsch commented May 16, 2022

@javanna are you ok to merge this, or would you like to have a look first?

@ywelsch
Copy link
Contributor Author

ywelsch commented May 16, 2022

@elasticmachine run elasticsearch-ci/packaging-tests-windows-sample

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ywelsch ywelsch merged commit 5aebb8e into elastic:master May 18, 2022
@ywelsch
Copy link
Contributor Author

ywelsch commented May 18, 2022

Thanks @romseygeek and @javanna!

ywelsch added a commit that referenced this pull request May 18, 2022
The above PR was merged concurrently to another one that refactored
a method name.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v8.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants