Skip to content

Commit fd7f69c

Browse files
authored
Allow doc-values only search on keyword fields (#82846)
Allows searching on keyword fields when those fields are not indexed (index: false) but just doc values are enabled. This enables searches on archive data, which has access to doc values but not index structures. When combined with searchable snapshots, it allows downloading only data for a given (doc value) field to quickly filter down to a select set of documents. Relates #81210 and #52728
1 parent 93d041d commit fd7f69c

File tree

10 files changed

+154
-39
lines changed

10 files changed

+154
-39
lines changed

docs/reference/mapping/params/doc-values.asciidoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@ makes this data access pattern possible. They store the same values as the
1717
sorting and aggregations. Doc values are supported on almost all field types,
1818
with the __notable exception of `text` and `annotated_text` fields__.
1919

20-
<<number,Numeric types>>, such as `long` and `double`, and <<date,Date types>>
21-
can also be queried
20+
<<number,Numeric types>>, <<date,date types>>, and the <<keyword, keyword type>>
21+
can also be queried using term or range-based queries
2222
when they are not <<mapping-index,indexed>> but only have doc values enabled.
2323
Query performance on doc values is much slower than on index structures, but
2424
offers an interesting tradeoff between disk usage and query performance for

docs/reference/mapping/types/keyword.asciidoc

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,10 @@ The following parameters are accepted by `keyword` fields:
8080

8181
<<mapping-index,`index`>>::
8282

83-
Should the field be searchable? Accepts `true` (default) or `false`.
83+
Should the field be quickly searchable? Accepts `true` (default) and
84+
`false`. `keyword` fields that only have <<doc-values,`doc_values`>>
85+
enabled can still be queried using term or range-based queries,
86+
albeit slower.
8487

8588
<<index-options,`index_options`>>::
8689

docs/reference/query-dsl.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ the stability of the cluster. Those queries can be categorised as follows:
3333

3434
* Queries that need to do linear scans to identify matches:
3535
** <<query-dsl-script-query,`script` queries>>
36-
** queries on <<number,numeric>> and <<date,date>> fields that are not indexed
36+
** queries on <<number,numeric>>, <<date,date>>, or <<keyword,keyword>> fields that are not indexed
3737
but have <<doc-values,doc values>> enabled
3838

3939
* Queries that have a high up-front cost:

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/field_caps/10_basic.yml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,9 @@ setup:
8686
non_indexed_date:
8787
type: date
8888
index: false
89+
non_indexed_keyword:
90+
type: keyword
91+
index: false
8992
geo:
9093
type: keyword
9194
object:
@@ -225,6 +228,18 @@ setup:
225228

226229
- match: {fields.non_indexed_date.date.searchable: true}
227230

231+
---
232+
"Field caps for keyword field with only doc values":
233+
- skip:
234+
version: " - 8.0.99"
235+
reason: "doc values search was added in 8.1.0"
236+
- do:
237+
field_caps:
238+
index: 'test1,test2,test3'
239+
fields: non_indexed_keyword
240+
241+
- match: {fields.non_indexed_keyword.keyword.searchable: true}
242+
228243
---
229244
"Get object and nested field caps":
230245

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search/390_doc_values_search.yml

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,9 @@ setup:
3636
type: date
3737
format: yyyy/MM/dd
3838
index: false
39+
keyword:
40+
type: keyword
41+
index: false
3942

4043
- do:
4144
index:
@@ -50,6 +53,7 @@ setup:
5053
long: 1
5154
short: 1
5255
date: "2017/01/01"
56+
keyword: "key1"
5357

5458
- do:
5559
index:
@@ -64,6 +68,7 @@ setup:
6468
long: 2
6569
short: 2
6670
date: "2017/01/02"
71+
keyword: "key2"
6772

6873
- do:
6974
indices.refresh: {}
@@ -220,3 +225,30 @@ setup:
220225
index: test
221226
body: { query: { range: { date: { gte: "2017/01/01" } } } }
222227
- length: { hits.hits: 2 }
228+
229+
---
230+
"Test match query on keyword field where only doc values are enabled":
231+
232+
- do:
233+
search:
234+
index: test
235+
body: { query: { match: { keyword: { query: "key1" } } } }
236+
- length: { hits.hits: 1 }
237+
238+
---
239+
"Test terms query on keyword field where only doc values are enabled":
240+
241+
- do:
242+
search:
243+
index: test
244+
body: { query: { terms: { keyword: [ "key1", "key2" ] } } }
245+
- length: { hits.hits: 2 }
246+
247+
---
248+
"Test range query on keyword field where only doc values are enabled":
249+
250+
- do:
251+
search:
252+
index: test
253+
body: { query: { range: { keyword: { gte: "key1" } } } }
254+
- length: { hits.hits: 2 }

server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -338,6 +338,16 @@ public KeywordFieldType(String name, NamedAnalyzer analyzer) {
338338
this.isDimension = false;
339339
}
340340

341+
@Override
342+
protected boolean allowDocValueBasedQueries() {
343+
return true;
344+
}
345+
346+
@Override
347+
public boolean isSearchable() {
348+
return isIndexed() || hasDocValues();
349+
}
350+
341351
@Override
342352
public TermsEnum getTerms(boolean caseInsensitive, String string, SearchExecutionContext queryShardContext, String searchAfter)
343353
throws IOException {

server/src/main/java/org/elasticsearch/index/mapper/StringFieldType.java

Lines changed: 23 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
package org.elasticsearch.index.mapper;
1010

1111
import org.apache.lucene.analysis.Analyzer;
12+
import org.apache.lucene.document.SortedSetDocValuesField;
1213
import org.apache.lucene.index.Term;
1314
import org.apache.lucene.search.AutomatonQuery;
1415
import org.apache.lucene.search.FuzzyQuery;
@@ -210,13 +211,27 @@ public Query rangeQuery(
210211
+ "' is set to false."
211212
);
212213
}
213-
failIfNotIndexed();
214-
return new TermRangeQuery(
215-
name(),
216-
lowerTerm == null ? null : indexedValueForSearch(lowerTerm),
217-
upperTerm == null ? null : indexedValueForSearch(upperTerm),
218-
includeLower,
219-
includeUpper
220-
);
214+
if (allowDocValueBasedQueries()) {
215+
failIfNotIndexedNorDocValuesFallback(context);
216+
} else {
217+
failIfNotIndexed();
218+
}
219+
if (isIndexed()) {
220+
return new TermRangeQuery(
221+
name(),
222+
lowerTerm == null ? null : indexedValueForSearch(lowerTerm),
223+
upperTerm == null ? null : indexedValueForSearch(upperTerm),
224+
includeLower,
225+
includeUpper
226+
);
227+
} else {
228+
return SortedSetDocValuesField.newSlowRangeQuery(
229+
name(),
230+
lowerTerm == null ? null : indexedValueForSearch(lowerTerm),
231+
upperTerm == null ? null : indexedValueForSearch(upperTerm),
232+
includeLower,
233+
includeUpper
234+
);
235+
}
221236
}
222237
}

server/src/main/java/org/elasticsearch/index/mapper/TermBasedFieldType.java

Lines changed: 26 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,9 @@
88

99
package org.elasticsearch.index.mapper;
1010

11+
import org.apache.lucene.document.SortedSetDocValuesField;
1112
import org.apache.lucene.index.Term;
13+
import org.apache.lucene.sandbox.search.DocValuesTermsQuery;
1214
import org.apache.lucene.search.Query;
1315
import org.apache.lucene.search.TermInSetQuery;
1416
import org.apache.lucene.search.TermQuery;
@@ -35,6 +37,10 @@ public TermBasedFieldType(
3537
super(name, isIndexed, isStored, hasDocValues, textSearchInfo, meta);
3638
}
3739

40+
protected boolean allowDocValueBasedQueries() {
41+
return false;
42+
}
43+
3844
/** Returns the indexed value used to construct search "values".
3945
* This method is used for the default implementations of most
4046
* query factory methods such as {@link #termQuery}. */
@@ -55,15 +61,31 @@ public boolean mayExistInIndex(SearchExecutionContext context) {
5561

5662
@Override
5763
public Query termQuery(Object value, SearchExecutionContext context) {
58-
failIfNotIndexed();
59-
return new TermQuery(new Term(name(), indexedValueForSearch(value)));
64+
if (allowDocValueBasedQueries()) {
65+
failIfNotIndexedNorDocValuesFallback(context);
66+
} else {
67+
failIfNotIndexed();
68+
}
69+
if (isIndexed()) {
70+
return new TermQuery(new Term(name(), indexedValueForSearch(value)));
71+
} else {
72+
return SortedSetDocValuesField.newSlowExactQuery(name(), indexedValueForSearch(value));
73+
}
6074
}
6175

6276
@Override
6377
public Query termsQuery(Collection<?> values, SearchExecutionContext context) {
64-
failIfNotIndexed();
78+
if (allowDocValueBasedQueries()) {
79+
failIfNotIndexedNorDocValuesFallback(context);
80+
} else {
81+
failIfNotIndexed();
82+
}
6583
BytesRef[] bytesRefs = values.stream().map(this::indexedValueForSearch).toArray(BytesRef[]::new);
66-
return new TermInSetQuery(name(), bytesRefs);
84+
if (isIndexed()) {
85+
return new TermInSetQuery(name(), bytesRefs);
86+
} else {
87+
return new DocValuesTermsQuery(name(), bytesRefs);
88+
}
6789
}
6890

6991
}

server/src/test/java/org/elasticsearch/index/mapper/KeywordFieldTypeTests.java

Lines changed: 32 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,9 @@
1717
import org.apache.lucene.analysis.core.WhitespaceTokenizer;
1818
import org.apache.lucene.analysis.standard.StandardAnalyzer;
1919
import org.apache.lucene.document.FieldType;
20+
import org.apache.lucene.document.SortedSetDocValuesField;
2021
import org.apache.lucene.index.Term;
22+
import org.apache.lucene.sandbox.search.DocValuesTermsQuery;
2123
import org.apache.lucene.search.DocValuesFieldExistsQuery;
2224
import org.apache.lucene.search.FuzzyQuery;
2325
import org.apache.lucene.search.NormsFieldExistsQuery;
@@ -52,7 +54,7 @@
5254
public class KeywordFieldTypeTests extends FieldTypeTestCase {
5355

5456
public void testIsFieldWithinQuery() throws IOException {
55-
KeywordFieldType ft = new KeywordFieldType("field");
57+
KeywordFieldType ft = new KeywordFieldType("field", randomBoolean(), randomBoolean(), Map.of());
5658
// current impl ignores args and should always return INTERSECTS
5759
assertEquals(
5860
Relation.INTERSECTS,
@@ -64,18 +66,21 @@ public void testIsFieldWithinQuery() throws IOException {
6466
randomBoolean(),
6567
null,
6668
null,
67-
null
69+
MOCK_CONTEXT
6870
)
6971
);
7072
}
7173

7274
public void testTermQuery() {
7375
MappedFieldType ft = new KeywordFieldType("field");
74-
assertEquals(new TermQuery(new Term("field", "foo")), ft.termQuery("foo", null));
76+
assertEquals(new TermQuery(new Term("field", "foo")), ft.termQuery("foo", MOCK_CONTEXT));
7577

76-
MappedFieldType unsearchable = new KeywordFieldType("field", false, true, Collections.emptyMap());
77-
IllegalArgumentException e = expectThrows(IllegalArgumentException.class, () -> unsearchable.termQuery("bar", null));
78-
assertEquals("Cannot search on field [field] since it is not indexed.", e.getMessage());
78+
MappedFieldType ft2 = new KeywordFieldType("field", false, true, Map.of());
79+
assertEquals(SortedSetDocValuesField.newSlowExactQuery("field", new BytesRef("foo")), ft2.termQuery("foo", MOCK_CONTEXT));
80+
81+
MappedFieldType unsearchable = new KeywordFieldType("field", false, false, Collections.emptyMap());
82+
IllegalArgumentException e = expectThrows(IllegalArgumentException.class, () -> unsearchable.termQuery("bar", MOCK_CONTEXT));
83+
assertEquals("Cannot search on field [field] since it is not indexed nor has doc values.", e.getMessage());
7984
}
8085

8186
public void testTermQueryWithNormalizer() {
@@ -93,38 +98,45 @@ protected TokenStream normalize(String fieldName, TokenStream in) {
9398
}
9499
};
95100
MappedFieldType ft = new KeywordFieldType("field", new NamedAnalyzer("my_normalizer", AnalyzerScope.INDEX, normalizer));
96-
assertEquals(new TermQuery(new Term("field", "foo bar")), ft.termQuery("fOo BaR", null));
101+
assertEquals(new TermQuery(new Term("field", "foo bar")), ft.termQuery("fOo BaR", MOCK_CONTEXT));
97102
}
98103

99104
public void testTermsQuery() {
100105
MappedFieldType ft = new KeywordFieldType("field");
101106
List<BytesRef> terms = new ArrayList<>();
102107
terms.add(new BytesRef("foo"));
103108
terms.add(new BytesRef("bar"));
104-
assertEquals(new TermInSetQuery("field", terms), ft.termsQuery(Arrays.asList("foo", "bar"), null));
109+
assertEquals(new TermInSetQuery("field", terms), ft.termsQuery(Arrays.asList("foo", "bar"), MOCK_CONTEXT));
105110

106-
MappedFieldType unsearchable = new KeywordFieldType("field", false, true, Collections.emptyMap());
111+
MappedFieldType ft2 = new KeywordFieldType("field", false, true, Map.of());
112+
assertEquals(new DocValuesTermsQuery("field", terms), ft2.termsQuery(Arrays.asList("foo", "bar"), MOCK_CONTEXT));
113+
114+
MappedFieldType unsearchable = new KeywordFieldType("field", false, false, Collections.emptyMap());
107115
IllegalArgumentException e = expectThrows(
108116
IllegalArgumentException.class,
109-
() -> unsearchable.termsQuery(Arrays.asList("foo", "bar"), null)
117+
() -> unsearchable.termsQuery(Arrays.asList("foo", "bar"), MOCK_CONTEXT)
110118
);
111-
assertEquals("Cannot search on field [field] since it is not indexed.", e.getMessage());
119+
assertEquals("Cannot search on field [field] since it is not indexed nor has doc values.", e.getMessage());
112120
}
113121

114122
public void testExistsQuery() {
115123
{
116124
KeywordFieldType ft = new KeywordFieldType("field");
117-
assertEquals(new DocValuesFieldExistsQuery("field"), ft.existsQuery(null));
125+
assertEquals(new DocValuesFieldExistsQuery("field"), ft.existsQuery(MOCK_CONTEXT));
126+
}
127+
{
128+
KeywordFieldType ft = new KeywordFieldType("field", false, true, Map.of());
129+
assertEquals(new DocValuesFieldExistsQuery("field"), ft.existsQuery(MOCK_CONTEXT));
118130
}
119131
{
120132
FieldType fieldType = new FieldType();
121133
fieldType.setOmitNorms(false);
122134
KeywordFieldType ft = new KeywordFieldType("field", fieldType);
123-
assertEquals(new NormsFieldExistsQuery("field"), ft.existsQuery(null));
135+
assertEquals(new NormsFieldExistsQuery("field"), ft.existsQuery(MOCK_CONTEXT));
124136
}
125137
{
126138
KeywordFieldType ft = new KeywordFieldType("field", true, false, Collections.emptyMap());
127-
assertEquals(new TermQuery(new Term(FieldNamesFieldMapper.NAME, "field")), ft.existsQuery(null));
139+
assertEquals(new TermQuery(new Term(FieldNamesFieldMapper.NAME, "field")), ft.existsQuery(MOCK_CONTEXT));
128140
}
129141
}
130142

@@ -135,6 +147,12 @@ public void testRangeQuery() {
135147
ft.rangeQuery("foo", "bar", true, false, null, null, null, MOCK_CONTEXT)
136148
);
137149

150+
MappedFieldType ft2 = new KeywordFieldType("field", false, true, Map.of());
151+
assertEquals(
152+
SortedSetDocValuesField.newSlowRangeQuery("field", BytesRefs.toBytesRef("foo"), BytesRefs.toBytesRef("bar"), true, false),
153+
ft2.rangeQuery("foo", "bar", true, false, null, null, null, MOCK_CONTEXT)
154+
);
155+
138156
ElasticsearchException ee = expectThrows(
139157
ElasticsearchException.class,
140158
() -> ft.rangeQuery("foo", "bar", true, false, null, null, null, MOCK_CONTEXT_DISALLOW_EXPENSIVE)

x-pack/plugin/sql/qa/server/src/main/java/org/elasticsearch/xpack/sql/qa/jdbc/SysColumnsTestCase.java

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -50,22 +50,22 @@ public void testAliasWithIncompatibleTypes() throws Exception {
5050

5151
public void testAliasWithIncompatibleSearchableProperty() throws Exception {
5252
createIndexWithMapping("test1", builder -> {
53-
builder.startObject("id").field("type", "keyword").endObject();
53+
builder.startObject("id").field("type", "text").endObject();
5454
builder.startObject("value").field("type", "boolean").endObject();
5555
});
5656

5757
createIndexWithMapping("test2", builder -> {
58-
builder.startObject("id").field("type", "keyword").field("index", false).endObject();
58+
builder.startObject("id").field("type", "text").field("index", false).endObject();
5959
builder.startObject("value").field("type", "boolean").endObject();
6060
});
6161

6262
createIndexWithMapping("test3", builder -> {
63-
builder.startObject("id").field("type", "keyword").field("index", false).endObject();
63+
builder.startObject("id").field("type", "text").field("index", false).endObject();
6464
builder.startObject("value").field("type", "boolean").endObject();
6565
});
6666

6767
createIndexWithMapping("test4", builder -> {
68-
builder.startObject("id").field("type", "keyword").field("index", false).endObject();
68+
builder.startObject("id").field("type", "text").field("index", false).endObject();
6969
builder.startObject("value").field("type", "boolean").endObject();
7070
});
7171

@@ -79,16 +79,16 @@ public void testAliasWithIncompatibleSearchableProperty() throws Exception {
7979
assertResultsForQuery(
8080
"SYS COLUMNS",
8181
new String[][] {
82-
{ "test1", "id", "KEYWORD" },
82+
{ "test1", "id", "TEXT" },
8383
{ "test1", "value", "BOOLEAN" },
84-
{ "test2", "id", "KEYWORD" },
84+
{ "test2", "id", "TEXT" },
8585
{ "test2", "value", "BOOLEAN" },
86-
{ "test3", "id", "KEYWORD" },
86+
{ "test3", "id", "TEXT" },
8787
{ "test3", "value", "BOOLEAN" },
88-
{ "test4", "id", "KEYWORD" },
88+
{ "test4", "id", "TEXT" },
8989
{ "test4", "value", "BOOLEAN" },
9090
{ "test_alias", "value", "BOOLEAN" },
91-
{ "test_alias2", "id", "KEYWORD" },
91+
{ "test_alias2", "id", "TEXT" },
9292
{ "test_alias2", "value", "BOOLEAN" } }
9393
);
9494
}

0 commit comments

Comments
 (0)