-
Notifications
You must be signed in to change notification settings - Fork 518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add composite aggregation documentation #7666
base: main
Are you sure you want to change the base?
Changes from 6 commits
978dcdd
bcf8016
c4144e0
6850148
13d7217
621d0c0
1fdb04d
e8adaf2
5b7b264
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
--- | ||
layout: default | ||
title: Composite | ||
parent: Bucket aggregations | ||
grand_parent: Aggregations | ||
nav_order: 20 | ||
has_children: true | ||
--- | ||
|
||
# Composite | ||
|
||
The `composite` aggregation is a multi-bucket aggregation that creates composite buckets from different sources. It is useful for efficiently paginating multi-level aggregations and retrieving all buckets. Composite buckets are built from combinations of values extracted from documents for each specified source field. | ||
|
||
## Syntax | ||
|
||
```json | ||
{ | ||
"composite": { | ||
"sources": [ | ||
{ | ||
"source_field_1": { | ||
"terms": { | ||
"field": "field_name" | ||
} | ||
} | ||
}, | ||
{ | ||
"source_field_2": { | ||
"terms": { | ||
"field": "another_field_name" | ||
} | ||
} | ||
} | ||
] | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
Property | Description | | ||
---------|------------| | ||
`composite` | The aggregation type. | ||
`sources ` | An array of source objects, where each object defines a source field for the composite buckets. | ||
`terms` | The subaggregation type used to extract the values from the specified field for each source. | ||
`field` | The field name in your documents from which the values will be extracted for the corresponding source. | ||
|
||
For example, consider the following document: | ||
|
||
```json | ||
{ | ||
"product": "T-Shirt", | ||
"category": "Clothing", | ||
"brand": "Acme", | ||
"price": 19.99, | ||
"sizes": ["S", "M", "L"], | ||
"colors": ["red", "blue"] | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
Using `sizes` and `colors` as source fields for the aggregation results in the following composite buckets: | ||
|
||
```json | ||
{ "sizes": "S", "colors": "red" } | ||
{ "sizes": "S", "colors": "blue" } | ||
{ "sizes": "M", "colors": "red" } | ||
{ "sizes": "M", "colors": "blue" } | ||
{ "sizes": "L", "colors": "red" } | ||
{ "sizes": "L", "colors": "blue" } | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Compatibility and limitations | ||
|
||
<SME: What version of OpenSearch is this compatible with? What are the limitations?> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Technical reviewer: Please provide information about compatibility and limitations. |
||
|
||
## Performance considerations | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Technical reviewer: Please provide information about performance considerations, if any. |
||
|
||
<What are the performance implications or best practices for using this aggregation?> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
layout: default | ||
title: Early termination | ||
parent: Composite | ||
grand_parent: Bucket aggregations | ||
great_grand_parent: Aggregations | ||
nav_order: 35 | ||
--- |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
layout: default | ||
title: Missing bucket | ||
parent: Composite | ||
grand_parent: Bucket aggregations | ||
great_grand_parent: Aggregations | ||
nav_order: 20 | ||
--- |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
--- | ||
layout: default | ||
title: Mixing value sources | ||
parent: Composite | ||
grand_parent: Bucket aggregations | ||
great_grand_parent: Aggregations | ||
nav_order: 10 | ||
--- | ||
|
||
# Mixing value sources | ||
|
||
The `sources` parameter in the composite aggregation defines the source fields and aggregation types to use when building composite buckets. You can mix and match multiple value sources, such as `terms`, `histogram`, `date_histogram`, and `geotile_grid`, to create unique combinations of data aggregations. | ||
|
||
The order in which the sources are defined controls the order in which the keys are returned in the composite buckets. You must use a unique name when defining sources for the composite aggregation. | ||
|
||
--- | ||
|
||
## Example: Mixing terms and histogram value sources | ||
|
||
The following example creates composite buckets that combine the `product` field (using the `terms` value source) and the `price` field (using the `histogram` value source): | ||
|
||
```json | ||
GET /test_index/_search | ||
{ | ||
"size": 0, | ||
"aggs": { | ||
"my_buckets": { | ||
"composite": { | ||
"sources": [ | ||
{ "product": { "terms": { "field": "product.keyword" } } }, | ||
{ "price_range": { "histogram": { "field": "price", "interval": 10 } } } | ||
] | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
This query defines two value sources: | ||
|
||
- `product`: This source uses the terms value source to create buckets for each unique value of the `product.keyword` field. | ||
- `price_range`: This source uses the `histogram` value source to create buckets based on the `price` field, grouped into intervals of `10`. | ||
|
||
The resulting composite buckets will have a structure similar to the following example: | ||
|
||
```json | ||
{ | ||
"data": [ | ||
{ | ||
"key": { | ||
"product": "Jeans", | ||
"price_range": 40 | ||
}, | ||
"doc_count": 1 | ||
}, | ||
{ | ||
"key": { | ||
"product": "Sneakers", | ||
"price_range": 70 | ||
}, | ||
"doc_count": 1 | ||
}, | ||
{ | ||
"key": { | ||
"product": "T-Shirt", | ||
"price_range": 10 | ||
}, | ||
"doc_count": 1 | ||
} | ||
] | ||
} | ||
``` | ||
Each composite bucket will contain the product name and the corresponding price range, allowing you to analyze the distribution of products across different price ranges. | ||
|
||
--- | ||
|
||
## Example: Mixing date histogram and geotile grid value source | ||
|
||
The following example combines the `date_histogram` and `geotile_grid` value sources to create composite buckets based on timestamps and geographic locations: | ||
|
||
```json | ||
GET /test_index/_search | ||
{ | ||
"size": 0, | ||
"aggs": { | ||
"my_buckets": { | ||
"composite": { | ||
"sources": [ | ||
{ "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d" } } }, | ||
{ "location": { "geotile_grid": { "field": "location", "precision": 3 } } } | ||
] | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
This query defines two value sources: | ||
|
||
- `date`: This source uses the `date_histogram` value source to group documents based on the day of the `timestamp` field. | ||
- `location`: This source uses the `geotile_grid` value source to aggregate `geo_point` data into buckets that correspond to cells in a grid, with a precision of `3`. | ||
|
||
The resulting composite buckets will have a structure similar to the following example: | ||
|
||
```json | ||
{ | ||
"took": 34, | ||
"timed_out": false, | ||
"_shards": { | ||
"total": 1, | ||
"successful": 1, | ||
"skipped": 0, | ||
"failed": 0 | ||
}, | ||
"hits": { | ||
"total": { | ||
"value": 3, | ||
"relation": "eq" | ||
}, | ||
"max_score": null, | ||
"hits": [] | ||
}, | ||
"aggregations": { | ||
"my_buckets": { | ||
"buckets": [] | ||
} | ||
} | ||
} | ||
``` | ||
|
||
## Considerations | ||
|
||
When mixing value sources in the `composite` aggregation, keep the following point in mind: | ||
|
||
- <SME: What are the considerations? Please list them here.> | ||
Check warning on line 137 in _aggregations/bucket/mixing-value-sources.md
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
layout: default | ||
title: Ordering composite buckets | ||
parent: Composite | ||
grand_parent: Bucket aggregations | ||
great_grand_parent: Aggregations | ||
nav_order: 15 | ||
--- |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
layout: default | ||
title: Size and pagination | ||
parent: Composite | ||
grand_parent: Bucket aggregations | ||
great_grand_parent: Aggregations | ||
nav_order: 25 | ||
--- |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
layout: default | ||
title: Subaggregations | ||
parent: Composite | ||
grand_parent: Bucket aggregations | ||
great_grand_parent: Aggregations | ||
nav_order: 30 | ||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technical reviewer: Please review this content and confirm the syntax and examples are accurate and relevant to an OpenSearch user. I tested the examples using Dev Tools. If another example is more appropriate, please replace the draft example with your example. Thank you.