Skip to content

Commit e421477

Browse files
authored
Allow docvalues-only search on number types (#82409)
Allows searching on number field types (long, short, int, float, double, byte, half_float) when those fields are not indexed (index: false) but just doc values are enabled. This enables searches on archive data, which has access to doc values but not index structures. When combined with searchable snapshots, it allows downloading only data for a given (doc value) field to quickly filter down to a select set of documents. Note to reviewers: I have split isSearchable into two separate methods isIndexed and isSearchable on MappedFieldType. The former one is about whether actual indexing data structures have been used (postings or points), and the latter one on whether you can run queries on the given field (e.g. used by field caps). For number field types, queries are now allowed whenever points are available or when doc values are available (i.e. searchability is expanded). Relates #81210 and #52728
1 parent 42afe10 commit e421477

File tree

50 files changed

+619
-189
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+619
-189
lines changed

docs/reference/mapping/params/doc-values.asciidoc

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,13 @@ makes this data access pattern possible. They store the same values as the
1717
sorting and aggregations. Doc values are supported on almost all field types,
1818
with the __notable exception of `text` and `annotated_text` fields__.
1919

20+
<<number,Numeric types>>, such as `long` and `double`, can also be queried
21+
when they are not <<mapping-index,indexed>> but only have doc values enabled.
22+
Query performance on doc values is much slower than on index structures, but
23+
offers an interesting tradeoff between disk usage and query performance for
24+
fields that are only rarely queried and where query performance is not as
25+
important.
26+
2027
All fields which support doc values have them enabled by default. If you are
2128
sure that you don't need to sort or aggregate on a field, or access the field
2229
value from a script, you can disable doc values in order to save disk space:

docs/reference/mapping/params/index.asciidoc

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,6 @@
22
=== `index`
33

44
The `index` option controls whether field values are indexed. It accepts `true`
5-
or `false` and defaults to `true`. Fields that are not indexed are not queryable.
5+
or `false` and defaults to `true`. Fields that are not indexed are typically
6+
not queryable.
67

docs/reference/mapping/types/numeric.asciidoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,9 @@ The following parameters are accepted by numeric types:
131131

132132
<<mapping-index,`index`>>::
133133

134-
Should the field be searchable? Accepts `true` (default) and `false`.
134+
Should the field be quickly searchable? Accepts `true` (default) and
135+
`false`. Numeric fields that only have <<doc-values,`doc_values`>>
136+
enabled can also be queried, albeit slower.
135137

136138
<<mapping-field-meta,`meta`>>::
137139

docs/reference/query-dsl.asciidoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ the stability of the cluster. Those queries can be categorised as follows:
3333

3434
* Queries that need to do linear scans to identify matches:
3535
** <<query-dsl-script-query,`script` queries>>
36+
** queries on <<number,numeric fields>> that are not indexed but have <<doc-values,doc values>> enabled
3637

3738
* Queries that have a high up-front cost:
3839
** <<query-dsl-fuzzy-query,`fuzzy` queries>> (except on

modules/mapper-extras/src/main/java/org/elasticsearch/index/mapper/extras/ScaledFloatFieldMapper.java

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,11 @@ public ScaledFloatFieldType(
198198
}
199199

200200
public ScaledFloatFieldType(String name, double scalingFactor) {
201-
this(name, true, false, true, Collections.emptyMap(), scalingFactor, null, null);
201+
this(name, scalingFactor, true);
202+
}
203+
204+
public ScaledFloatFieldType(String name, double scalingFactor, boolean indexed) {
205+
this(name, indexed, false, true, Collections.emptyMap(), scalingFactor, null, null);
202206
}
203207

204208
public double getScalingFactor() {
@@ -212,20 +216,24 @@ public String typeName() {
212216

213217
@Override
214218
public Query termQuery(Object value, SearchExecutionContext context) {
215-
failIfNotIndexed();
219+
failIfNotIndexedNorDocValuesFallback(context);
216220
long scaledValue = Math.round(scale(value));
217-
return NumberFieldMapper.NumberType.LONG.termQuery(name(), scaledValue);
221+
return NumberFieldMapper.NumberType.LONG.termQuery(name(), scaledValue, isIndexed());
218222
}
219223

220224
@Override
221225
public Query termsQuery(Collection<?> values, SearchExecutionContext context) {
222-
failIfNotIndexed();
223-
List<Long> scaledValues = new ArrayList<>(values.size());
224-
for (Object value : values) {
225-
long scaledValue = Math.round(scale(value));
226-
scaledValues.add(scaledValue);
226+
failIfNotIndexedNorDocValuesFallback(context);
227+
if (isIndexed()) {
228+
List<Long> scaledValues = new ArrayList<>(values.size());
229+
for (Object value : values) {
230+
long scaledValue = Math.round(scale(value));
231+
scaledValues.add(scaledValue);
232+
}
233+
return NumberFieldMapper.NumberType.LONG.termsQuery(name(), Collections.unmodifiableList(scaledValues));
234+
} else {
235+
return super.termsQuery(values, context);
227236
}
228-
return NumberFieldMapper.NumberType.LONG.termsQuery(name(), Collections.unmodifiableList(scaledValues));
229237
}
230238

231239
@Override
@@ -236,7 +244,7 @@ public Query rangeQuery(
236244
boolean includeUpper,
237245
SearchExecutionContext context
238246
) {
239-
failIfNotIndexed();
247+
failIfNotIndexedNorDocValuesFallback(context);
240248
Long lo = null;
241249
if (lowerTerm != null) {
242250
double dValue = scale(lowerTerm);
@@ -253,7 +261,7 @@ public Query rangeQuery(
253261
}
254262
hi = Math.round(Math.floor(dValue));
255263
}
256-
return NumberFieldMapper.NumberType.LONG.rangeQuery(name(), lo, hi, true, true, hasDocValues(), context);
264+
return NumberFieldMapper.NumberType.LONG.rangeQuery(name(), lo, hi, true, true, hasDocValues(), context, isIndexed());
257265
}
258266

259267
@Override

modules/mapper-extras/src/test/java/org/elasticsearch/index/mapper/extras/ScaledFloatFieldTypeTests.java

Lines changed: 38 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@
1818
import org.apache.lucene.search.IndexSearcher;
1919
import org.apache.lucene.search.Query;
2020
import org.apache.lucene.store.Directory;
21+
import org.apache.lucene.util.NumericUtils;
22+
import org.elasticsearch.ElasticsearchException;
2123
import org.elasticsearch.core.internal.io.IOUtils;
2224
import org.elasticsearch.index.fielddata.IndexNumericFieldData;
2325
import org.elasticsearch.index.fielddata.LeafNumericFieldData;
@@ -41,7 +43,14 @@ public void testTermQuery() {
4143
);
4244
double value = (randomDouble() * 2 - 1) * 10000;
4345
long scaledValue = Math.round(value * ft.getScalingFactor());
44-
assertEquals(LongPoint.newExactQuery("scaled_float", scaledValue), ft.termQuery(value, null));
46+
assertEquals(LongPoint.newExactQuery("scaled_float", scaledValue), ft.termQuery(value, MOCK_CONTEXT));
47+
48+
MappedFieldType ft2 = new ScaledFloatFieldMapper.ScaledFloatFieldType("scaled_float", 0.1 + randomDouble() * 100, false);
49+
ElasticsearchException e2 = expectThrows(ElasticsearchException.class, () -> ft2.termQuery("42", MOCK_CONTEXT_DISALLOW_EXPENSIVE));
50+
assertEquals(
51+
"Cannot search on field [scaled_float] since it is not indexed and 'search.allow_expensive_queries' is set to false.",
52+
e2.getMessage()
53+
);
4554
}
4655

4756
public void testTermsQuery() {
@@ -53,7 +62,20 @@ public void testTermsQuery() {
5362
long scaledValue1 = Math.round(value1 * ft.getScalingFactor());
5463
double value2 = (randomDouble() * 2 - 1) * 10000;
5564
long scaledValue2 = Math.round(value2 * ft.getScalingFactor());
56-
assertEquals(LongPoint.newSetQuery("scaled_float", scaledValue1, scaledValue2), ft.termsQuery(Arrays.asList(value1, value2), null));
65+
assertEquals(
66+
LongPoint.newSetQuery("scaled_float", scaledValue1, scaledValue2),
67+
ft.termsQuery(Arrays.asList(value1, value2), MOCK_CONTEXT)
68+
);
69+
70+
MappedFieldType ft2 = new ScaledFloatFieldMapper.ScaledFloatFieldType("scaled_float", 0.1 + randomDouble() * 100, false);
71+
ElasticsearchException e2 = expectThrows(
72+
ElasticsearchException.class,
73+
() -> ft2.termsQuery(Arrays.asList(value1, value2), MOCK_CONTEXT_DISALLOW_EXPENSIVE)
74+
);
75+
assertEquals(
76+
"Cannot search on field [scaled_float] since it is not indexed and 'search.allow_expensive_queries' is set to false.",
77+
e2.getMessage()
78+
);
5779
}
5880

5981
public void testRangeQuery() throws IOException {
@@ -62,9 +84,9 @@ public void testRangeQuery() throws IOException {
6284
// searching doubles that are rounded to the closest half float
6385
ScaledFloatFieldMapper.ScaledFloatFieldType ft = new ScaledFloatFieldMapper.ScaledFloatFieldType(
6486
"scaled_float",
65-
true,
66-
false,
87+
randomBoolean(),
6788
false,
89+
true,
6890
Collections.emptyMap(),
6991
0.1 + randomDouble() * 100,
7092
null,
@@ -79,7 +101,9 @@ public void testRangeQuery() throws IOException {
79101
long scaledValue = Math.round(value * ft.getScalingFactor());
80102
double rounded = scaledValue / ft.getScalingFactor();
81103
doc.add(new LongPoint("scaled_float", scaledValue));
104+
doc.add(new SortedNumericDocValuesField("scaled_float", scaledValue));
82105
doc.add(new DoublePoint("double", rounded));
106+
doc.add(new SortedNumericDocValuesField("double", NumericUtils.doubleToSortableLong(rounded)));
83107
w.addDocument(doc);
84108
}
85109
final DirectoryReader reader = DirectoryReader.open(w);
@@ -91,7 +115,16 @@ public void testRangeQuery() throws IOException {
91115
Double u = randomBoolean() ? null : (randomDouble() * 2 - 1) * 10000;
92116
boolean includeLower = randomBoolean();
93117
boolean includeUpper = randomBoolean();
94-
Query doubleQ = NumberFieldMapper.NumberType.DOUBLE.rangeQuery("double", l, u, includeLower, includeUpper, false, MOCK_CONTEXT);
118+
Query doubleQ = NumberFieldMapper.NumberType.DOUBLE.rangeQuery(
119+
"double",
120+
l,
121+
u,
122+
includeLower,
123+
includeUpper,
124+
false,
125+
MOCK_CONTEXT,
126+
randomBoolean()
127+
);
95128
Query scaledFloatQ = ft.rangeQuery(l, u, includeLower, includeUpper, MOCK_CONTEXT);
96129
assertEquals(searcher.count(doubleQ), searcher.count(scaledFloatQ));
97130
}

modules/percolator/src/test/java/org/elasticsearch/percolator/CandidateQueryTests.java

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -257,7 +257,7 @@ public void testDuel() throws Exception {
257257
// many iterations with boolean queries, which are the most complex queries to deal with when nested
258258
int numRandomBoolQueries = 1000;
259259
for (int i = 0; i < numRandomBoolQueries; i++) {
260-
queryFunctions.add(() -> createRandomBooleanQuery(1, stringFields, stringContent, intFieldType, intValues));
260+
queryFunctions.add(() -> createRandomBooleanQuery(1, stringFields, stringContent, intFieldType, intValues, context));
261261
}
262262
queryFunctions.add(() -> {
263263
int numClauses = randomIntBetween(1, 1 << randomIntBetween(2, 4));
@@ -312,7 +312,8 @@ private BooleanQuery createRandomBooleanQuery(
312312
List<String> fields,
313313
Map<String, List<String>> content,
314314
MappedFieldType intFieldType,
315-
List<Integer> intValues
315+
List<Integer> intValues,
316+
SearchExecutionContext context
316317
) {
317318
BooleanQuery.Builder builder = new BooleanQuery.Builder();
318319
int numClauses = randomIntBetween(1, 1 << randomIntBetween(2, 4)); // use low numbers of clauses more often
@@ -326,24 +327,24 @@ private BooleanQuery createRandomBooleanQuery(
326327
String field = randomFrom(fields);
327328
builder.add(new TermQuery(new Term(field, randomFrom(content.get(field)))), occur);
328329
} else {
329-
builder.add(intFieldType.termQuery(randomFrom(intValues), null), occur);
330+
builder.add(intFieldType.termQuery(randomFrom(intValues), context), occur);
330331
}
331332
} else if (rarely() && depth <= 3) {
332333
occur = randomFrom(Arrays.asList(Occur.FILTER, Occur.MUST, Occur.SHOULD));
333-
builder.add(createRandomBooleanQuery(depth + 1, fields, content, intFieldType, intValues), occur);
334+
builder.add(createRandomBooleanQuery(depth + 1, fields, content, intFieldType, intValues, context), occur);
334335
} else if (rarely()) {
335336
if (randomBoolean()) {
336337
occur = randomFrom(Arrays.asList(Occur.FILTER, Occur.MUST, Occur.SHOULD));
337338
if (randomBoolean()) {
338339
builder.add(new TermQuery(new Term("unknown_field", randomAlphaOfLength(8))), occur);
339340
} else {
340-
builder.add(intFieldType.termQuery(randomFrom(intValues), null), occur);
341+
builder.add(intFieldType.termQuery(randomFrom(intValues), context), occur);
341342
}
342343
} else if (randomBoolean()) {
343344
String field = randomFrom(fields);
344345
builder.add(new TermQuery(new Term(field, randomFrom(content.get(field)))), occur = Occur.MUST_NOT);
345346
} else {
346-
builder.add(intFieldType.termQuery(randomFrom(intValues), null), occur = Occur.MUST_NOT);
347+
builder.add(intFieldType.termQuery(randomFrom(intValues), context), occur = Occur.MUST_NOT);
347348
}
348349
} else {
349350
if (randomBoolean()) {
@@ -352,7 +353,7 @@ private BooleanQuery createRandomBooleanQuery(
352353
String field = randomFrom(fields);
353354
builder.add(new TermQuery(new Term(field, randomFrom(content.get(field)))), occur);
354355
} else {
355-
builder.add(intFieldType.termQuery(randomFrom(intValues), null), occur);
356+
builder.add(intFieldType.termQuery(randomFrom(intValues), context), occur);
356357
}
357358
} else {
358359
builder.add(new TermQuery(new Term("unknown_field", randomAlphaOfLength(8))), occur = Occur.MUST_NOT);

rest-api-spec/build.gradle

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -218,6 +218,8 @@ tasks.named("yamlRestTestV7CompatTransform").configure { task ->
218218
// sync_id is no longer available in SegmentInfos.userData // "indices.flush/10_basic/Index synced flush rest test"
219219
task.replaceIsTrue("indices.testing.shards.0.0.commit.user_data.sync_id", "indices.testing.shards.0.0.commit.user_data")
220220

221+
// we can now search using doc values only
222+
task.replaceValueInMatch("fields.object\\.nested1.long.searchable", true)
221223
}
222224

223225
tasks.register('enforceYamlTestConvention').configure {

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/field_caps/10_basic.yml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,6 @@ setup:
178178
index: 'test1,test2,test3'
179179
fields: object*
180180

181-
- match: {fields.object\.nested1.long.searchable: false}
182181
- match: {fields.object\.nested1.long.aggregatable: true}
183182
- match: {fields.object\.nested1.long.indices: ["test3"]}
184183
- is_false: fields.object\.nested1.long.non_searchable_indices
@@ -198,6 +197,19 @@ setup:
198197
- match: {fields.object\.nested2.keyword.indices: ["test3"]}
199198
- is_false: fields.object\.nested2.keyword.non_aggregatable_indices
200199
- is_false: fields.object\.nested2.keyword.non_searchable_indices
200+
201+
---
202+
"Field caps for number field with only doc values":
203+
- skip:
204+
version: " - 8.0.99"
205+
reason: "doc values search was added in 8.1.0"
206+
- do:
207+
field_caps:
208+
index: 'test1,test2,test3'
209+
fields: object*
210+
211+
- match: {fields.object\.nested1.long.searchable: true}
212+
201213
---
202214
"Get object and nested field caps":
203215

0 commit comments

Comments
 (0)