Add text field support to archive indices #86591

ywelsch · 2022-05-10T07:51:59Z

Adds support for "text" fields in archive indices, with the goal of adding simple filtering support on text fields when querying archive indices.

There are some differences to regular text fields:

no global statistics: queries on text fields return constant score (similar to match_only_text).
analyzer fields can be updated
if defined analyzer is not available, falls back to default analyzer
no guarantees that analyzers are BWC

The above limitations also give us the flexibility to eventually swap out the implementation with a "runtime-text field" variant, and hence only provide those capabilities that can be emulated via a runtime field.

Relates #81210

o

…-support

o x

ywelsch · 2022-05-10T07:55:51Z

server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadataVerifier.java

@@ -92,7 +89,7 @@ public IndexMetadata verifyIndexMetadata(IndexMetadata indexMetadata, Version mi
        // Next we have to run this otherwise if we try to create IndexSettings
        // with broken settings it would fail in checkMappingsCompatibility
        newMetadata = archiveBrokenIndexSettings(newMetadata);
-        createAndValidateMapping(newMetadata);
+        checkMappingsCompatibility(newMetadata);


The changes in this class revert the changes made in #85059, as we now validate against a MapperService with all analyzers configured when restoring the legacy index in RestoreService. The reason for doing it this way now is that it provides better error messages on restore, but also handles a tricky situation where the Mapping returned by these methods here would have their analyzer settings misconfigured as checkMappingsCompatibility would not create a proper environment with actual analyzers configured.

ywelsch · 2022-05-10T07:58:14Z

server/src/test/java/org/elasticsearch/index/mapper/ConstantScoreTextFieldTypeTests.java

+import static org.apache.lucene.search.MultiTermQuery.CONSTANT_SCORE_REWRITE;
+import static org.hamcrest.Matchers.equalTo;
+
+public class ConstantScoreTextFieldTypeTests extends FieldTypeTestCase {


This contains all the tests of TextFieldTypeTests with some adaptations for constant scoring.

ywelsch · 2022-05-10T07:59:00Z

server/src/test/java/org/elasticsearch/index/mapper/MultiFieldTests.java

@@ -227,7 +227,7 @@ public void testUnmappedLegacyFieldsUnderKnownRootField() throws Exception {
            b.startObject("name");
            b.field("type", "keyword");
            b.startObject("fields");
-            b.startObject("subfield").field("type", "text").endObject();
+            b.startObject("subfield").field("type", CompletionFieldMapper.CONTENT_TYPE).endObject();


now that we support text fields, we can't use it anymore to check "placeholder" functionality. Instead we use another unsupported field.

elasticmachine · 2022-05-10T09:27:48Z

Pinging @elastic/es-search (Team:Search)

romseygeek

Thanks @ywelsch! This looks pretty close - I left a couple of questions.

server/src/main/java/org/elasticsearch/index/mapper/FieldMapper.java

romseygeek · 2022-05-10T10:12:19Z

server/src/main/java/org/elasticsearch/index/mapper/TextFieldMapper.java

+            // Disable scoring
+            return new ConstantScoreQuery(super.phrasePrefixQuery(stream, slop, maxExpansions, queryShardContext));
+        }
+


Span prefix queries will do something weird here, won't they? But then span prefix queries are kind of weird in any case. I think they're enough of an edge case that we can override spanPrefixQuery() and throw an exception saying we don't support them on legacy indexes

I agree that we don't need to support them. I've pushed 604d70c but I'm not sure how to add a test for it (I couldn't find existing tests that exercise this method).

It looks as though the two places it would be used are 1) a prefix query wrapped in a span multiterm query, which can be replaced by an interval query; and 2) a match_phrase_prefix query that uses multiterm synonyms, which will get factored away when we rework things to use QueryBuilders properly. So I think we can happily just throw an exception here and not worry about it further :)

romseygeek · 2022-05-10T10:14:26Z

server/src/test/java/org/elasticsearch/index/mapper/ConstantScoreTextFieldTypeTests.java

+        assertFalse(ft.isAggregatable());
+        ft.setFielddata(true);
+        assertTrue(ft.isAggregatable());
+    }


Can we support fielddata on older indexes or will that run into the same issues with global stats? I think we probably need to either explicitly disable it and throw an error when it's accessed or have a test for a significant terms agg against a legacy text field.

Disabling fielddata on older indexes is ok I think (the main purpose, as stated in the PR description, is to provide basic filtering support on archive indices). I've addressed this in 56d391c

…-support

romseygeek

LGTM, thanks @ywelsch

romseygeek · 2022-05-16T12:51:01Z

server/src/main/java/org/elasticsearch/index/mapper/TextFieldMapper.java

+            // Disable scoring
+            return new ConstantScoreQuery(super.phrasePrefixQuery(stream, slop, maxExpansions, queryShardContext));
+        }
+


It looks as though the two places it would be used are 1) a prefix query wrapped in a span multiterm query, which can be replaced by an interval query; and 2) a match_phrase_prefix query that uses multiterm synonyms, which will get factored away when we rework things to use QueryBuilders properly. So I think we can happily just throw an exception here and not worry about it further :)

ywelsch · 2022-05-16T13:32:27Z

@javanna are you ok to merge this, or would you like to have a look first?

ywelsch · 2022-05-16T13:32:51Z

@elasticmachine run elasticsearch-ci/packaging-tests-windows-sample

javanna

LGTM

ywelsch · 2022-05-18T08:25:54Z

Thanks @romseygeek and @javanna!

The above PR was merged concurrently to another one that refactored a method name.

ywelsch added 21 commits March 24, 2022 09:02

Support old postings formats

354624f

javadoc

56d0e7b

review comments

bbaa535

o

Merge remote-tracking branch 'elastic/master' into old-postings

31d37af

Merge remote-tracking branch 'elastic/master' into old-postings

eb30319

Merge remote-tracking branch 'elastic/master' into old-postings

b8ffa8d

Verify / rewrite mappings using full analysis service

0d40083

allow queries on text field type

b7dd421

make analyzer lenient and updateable

7f590ad

Merge remote-tracking branch 'elastic/master' into old-postings

8d03678

fix tests

c33e835

fix test

ac09d02

test fixes

e99a176

use constant scoring

c0e508f

Merge remote-tracking branch 'elastic/master' into archive-text-field…

55c462c

…-support

Merge remote-tracking branch 'elastic/master' into archive-text-field…

97c49e2

…-support

revert change

d28cea8

tests

0415143

tests

7ef03bd

Merge remote-tracking branch 'elastic/master' into archive-text-field…

9d4b7d0

…-support

fixø

1da6dd1

o x

elasticsearchmachine added the v8.3.0 label May 10, 2022

ywelsch commented May 10, 2022

View reviewed changes

remove

948f2a9

ywelsch mentioned this pull request May 10, 2022

Snapshots as simple archives #81210

Closed

32 tasks

ywelsch requested a review from romseygeek May 10, 2022 09:27

ywelsch added >non-issue :Search/Search Search-related issues that do not fall into other categories labels May 10, 2022

ywelsch marked this pull request as ready for review May 10, 2022 09:27

elasticmachine added the Team:Search Meta label for search team label May 10, 2022

ywelsch mentioned this pull request May 10, 2022

Disable get API on legacy indices #86594

Merged

ywelsch requested a review from javanna May 10, 2022 09:59

romseygeek reviewed May 10, 2022

View reviewed changes

ywelsch added 8 commits May 10, 2022 14:40

no fielddata

56d391c

Merge remote-tracking branch 'elastic/master' into archive-text-field…

6780c48

…-support

no spans

604d70c

disable norms properly

817b5a3

Merge remote-tracking branch 'elastic/master' into archive-text-field…

80419e3

…-support

fix existsQuery on text fields

1f96dfc

Merge remote-tracking branch 'elastic/master' into archive-text-field…

334efdf

…-support

Merge remote-tracking branch 'elastic/master' into archive-text-field…

3033e12

…-support

romseygeek approved these changes May 16, 2022

View reviewed changes

javanna approved these changes May 18, 2022

View reviewed changes

ywelsch merged commit 5aebb8e into elastic:master May 18, 2022

ywelsch added a commit that referenced this pull request May 18, 2022

Fix compilation for (#86591)

fd99a50

The above PR was merged concurrently to another one that refactored a method name.

Add text field support to archive indices #86591

Add text field support to archive indices #86591

Uh oh!

Conversation

ywelsch commented May 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented May 10, 2022

Uh oh!

romseygeek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

romseygeek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ywelsch commented May 16, 2022

Uh oh!

ywelsch commented May 16, 2022

Uh oh!

javanna left a comment

Choose a reason for hiding this comment

Uh oh!

ywelsch commented May 18, 2022

Uh oh!

Uh oh!

ywelsch commented May 10, 2022 •

edited

Loading