-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Add text field support to archive indices #86591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -92,7 +89,7 @@ public IndexMetadata verifyIndexMetadata(IndexMetadata indexMetadata, Version mi | |||
// Next we have to run this otherwise if we try to create IndexSettings | |||
// with broken settings it would fail in checkMappingsCompatibility | |||
newMetadata = archiveBrokenIndexSettings(newMetadata); | |||
createAndValidateMapping(newMetadata); | |||
checkMappingsCompatibility(newMetadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this class revert the changes made in #85059, as we now validate against a MapperService with all analyzers configured when restoring the legacy index in RestoreService
. The reason for doing it this way now is that it provides better error messages on restore, but also handles a tricky situation where the Mapping returned by these methods here would have their analyzer settings misconfigured as checkMappingsCompatibility would not create a proper environment with actual analyzers configured.
import static org.apache.lucene.search.MultiTermQuery.CONSTANT_SCORE_REWRITE; | ||
import static org.hamcrest.Matchers.equalTo; | ||
|
||
public class ConstantScoreTextFieldTypeTests extends FieldTypeTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This contains all the tests of TextFieldTypeTests
with some adaptations for constant scoring.
@@ -227,7 +227,7 @@ public void testUnmappedLegacyFieldsUnderKnownRootField() throws Exception { | |||
b.startObject("name"); | |||
b.field("type", "keyword"); | |||
b.startObject("fields"); | |||
b.startObject("subfield").field("type", "text").endObject(); | |||
b.startObject("subfield").field("type", CompletionFieldMapper.CONTENT_TYPE).endObject(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that we support text fields, we can't use it anymore to check "placeholder" functionality. Instead we use another unsupported field.
Pinging @elastic/es-search (Team:Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ywelsch! This looks pretty close - I left a couple of questions.
// Disable scoring | ||
return new ConstantScoreQuery(super.phrasePrefixQuery(stream, slop, maxExpansions, queryShardContext)); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Span prefix queries will do something weird here, won't they? But then span prefix queries are kind of weird in any case. I think they're enough of an edge case that we can override spanPrefixQuery()
and throw an exception saying we don't support them on legacy indexes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we don't need to support them. I've pushed 604d70c but I'm not sure how to add a test for it (I couldn't find existing tests that exercise this method).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks as though the two places it would be used are 1) a prefix query wrapped in a span multiterm query, which can be replaced by an interval query; and 2) a match_phrase_prefix query that uses multiterm synonyms, which will get factored away when we rework things to use QueryBuilders properly. So I think we can happily just throw an exception here and not worry about it further :)
assertFalse(ft.isAggregatable()); | ||
ft.setFielddata(true); | ||
assertTrue(ft.isAggregatable()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we support fielddata on older indexes or will that run into the same issues with global stats? I think we probably need to either explicitly disable it and throw an error when it's accessed or have a test for a significant terms agg against a legacy text field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disabling fielddata on older indexes is ok I think (the main purpose, as stated in the PR description, is to provide basic filtering support on archive indices). I've addressed this in 56d391c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @ywelsch
// Disable scoring | ||
return new ConstantScoreQuery(super.phrasePrefixQuery(stream, slop, maxExpansions, queryShardContext)); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks as though the two places it would be used are 1) a prefix query wrapped in a span multiterm query, which can be replaced by an interval query; and 2) a match_phrase_prefix query that uses multiterm synonyms, which will get factored away when we rework things to use QueryBuilders properly. So I think we can happily just throw an exception here and not worry about it further :)
@javanna are you ok to merge this, or would you like to have a look first? |
@elasticmachine run elasticsearch-ci/packaging-tests-windows-sample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @romseygeek and @javanna! |
The above PR was merged concurrently to another one that refactored a method name.
Adds support for "text" fields in archive indices, with the goal of adding simple filtering support on text fields when querying archive indices.
There are some differences to regular text fields:
match_only_text
).analyzer
fields can be updatedanalyzer
is not available, falls back to defaultanalyzer
The above limitations also give us the flexibility to eventually swap out the implementation with a "runtime-text field" variant, and hence only provide those capabilities that can be emulated via a runtime field.
Relates #81210