Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add composite aggregation documentation #7666

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
79 changes: 79 additions & 0 deletions _aggregations/bucket/composite.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
layout: default
title: Composite
parent: Bucket aggregations
grand_parent: Aggregations
nav_order: 20
has_children: true
---

# Composite

The `composite` aggregation is a multi-bucket aggregation that creates composite buckets from different sources. It is useful for efficiently paginating multi-level aggregations and retrieving all buckets. Composite buckets are built from combinations of values extracted from documents for each specified source field.

## Syntax
Copy link
Contributor Author

@vagimeli vagimeli Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technical reviewer: Please review this content and confirm the syntax and examples are accurate and relevant to an OpenSearch user. I tested the examples using Dev Tools. If another example is more appropriate, please replace the draft example with your example. Thank you.


```json
{
"composite": {
"sources": [
{
"source_field_1": {
"terms": {
"field": "field_name"
}
}
},
{
"source_field_2": {
"terms": {
"field": "another_field_name"
}
}
}
]
}
}
```
{% include copy-curl.html %}

Property | Description |
---------|------------|
`composite` | The aggregation type.
`sources ` | An array of source objects, where each object defines a source field for the composite buckets.
`terms` | The subaggregation type used to extract the values from the specified field for each source.
`field` | The field name in your documents from which the values will be extracted for the corresponding source.

For example, consider the following document:

```json
{
"product": "T-Shirt",
"category": "Clothing",
"brand": "Acme",
"price": 19.99,
"sizes": ["S", "M", "L"],
"colors": ["red", "blue"]
}
```
{% include copy-curl.html %}

Using `sizes` and `colors` as source fields for the aggregation results in the following composite buckets:

```json
{ "sizes": "S", "colors": "red" }
{ "sizes": "S", "colors": "blue" }
{ "sizes": "M", "colors": "red" }
{ "sizes": "M", "colors": "blue" }
{ "sizes": "L", "colors": "red" }
{ "sizes": "L", "colors": "blue" }
```
{% include copy-curl.html %}

## Compatibility and limitations

<SME: What version of OpenSearch is this compatible with? What are the limitations?>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technical reviewer: Please provide information about compatibility and limitations.


## Performance considerations
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technical reviewer: Please provide information about performance considerations, if any.


<What are the performance implications or best practices for using this aggregation?>
8 changes: 8 additions & 0 deletions _aggregations/bucket/early-termination.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
layout: default
title: Early termination
parent: Composite
grand_parent: Bucket aggregations
great_grand_parent: Aggregations
nav_order: 35
---
8 changes: 8 additions & 0 deletions _aggregations/bucket/missing-bucket.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
layout: default
title: Missing bucket
parent: Composite
grand_parent: Bucket aggregations
great_grand_parent: Aggregations
nav_order: 20
---
137 changes: 137 additions & 0 deletions _aggregations/bucket/mixing-value-sources.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
---
layout: default
title: Mixing value sources
parent: Composite
grand_parent: Bucket aggregations
great_grand_parent: Aggregations
nav_order: 10
---

# Mixing value sources

The `sources` parameter in the composite aggregation defines the source fields and aggregation types to use when building composite buckets. You can mix and match multiple value sources, such as `terms`, `histogram`, `date_histogram`, and `geotile_grid`, to create unique combinations of data aggregations.

The order in which the sources are defined controls the order in which the keys are returned in the composite buckets. You must use a unique name when defining sources for the composite aggregation.

---

## Example: Mixing terms and histogram value sources

The following example creates composite buckets that combine the `product` field (using the `terms` value source) and the `price` field (using the `histogram` value source):

```json
GET /test_index/_search
{
"size": 0,
"aggs": {
"my_buckets": {
"composite": {
"sources": [
{ "product": { "terms": { "field": "product.keyword" } } },
{ "price_range": { "histogram": { "field": "price", "interval": 10 } } }
]
}
}
}
}
```
{% include copy-curl.html %}

This query defines two value sources:

- `product`: This source uses the terms value source to create buckets for each unique value of the `product.keyword` field.
- `price_range`: This source uses the `histogram` value source to create buckets based on the `price` field, grouped into intervals of `10`.

The resulting composite buckets will have a structure similar to the following example:

```json
{
"data": [
{
"key": {
"product": "Jeans",
"price_range": 40
},
"doc_count": 1
},
{
"key": {
"product": "Sneakers",
"price_range": 70
},
"doc_count": 1
},
{
"key": {
"product": "T-Shirt",
"price_range": 10
},
"doc_count": 1
}
]
}
```
Each composite bucket will contain the product name and the corresponding price range, allowing you to analyze the distribution of products across different price ranges.

---

## Example: Mixing date histogram and geotile grid value source

The following example combines the `date_histogram` and `geotile_grid` value sources to create composite buckets based on timestamps and geographic locations:

```json
GET /test_index/_search
{
"size": 0,
"aggs": {
"my_buckets": {
"composite": {
"sources": [
{ "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d" } } },
{ "location": { "geotile_grid": { "field": "location", "precision": 3 } } }
]
}
}
}
}
```
{% include copy-curl.html %}

This query defines two value sources:

- `date`: This source uses the `date_histogram` value source to group documents based on the day of the `timestamp` field.
- `location`: This source uses the `geotile_grid` value source to aggregate `geo_point` data into buckets that correspond to cells in a grid, with a precision of `3`.

The resulting composite buckets will have a structure similar to the following example:

```json
{
"took": 34,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"my_buckets": {
"buckets": []
}
}
}
```

## Considerations

When mixing value sources in the `composite` aggregation, keep the following point in mind:

- <SME: What are the considerations? Please list them here.>

Check warning on line 137 in _aggregations/bucket/mixing-value-sources.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _aggregations/bucket/mixing-value-sources.md#L137

[OpenSearch.Please] Using 'Please' is unnecessary. Remove.
Raw output
{"message": "[OpenSearch.Please] Using 'Please' is unnecessary. Remove.", "location": {"path": "_aggregations/bucket/mixing-value-sources.md", "range": {"start": {"line": 137, "column": 38}}}, "severity": "WARNING"}
8 changes: 8 additions & 0 deletions _aggregations/bucket/ordering-composite-buckets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
layout: default
title: Ordering composite buckets
parent: Composite
grand_parent: Bucket aggregations
great_grand_parent: Aggregations
nav_order: 15
---
8 changes: 8 additions & 0 deletions _aggregations/bucket/size-pagination.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
layout: default
title: Size and pagination
parent: Composite
grand_parent: Bucket aggregations
great_grand_parent: Aggregations
nav_order: 25
---
8 changes: 8 additions & 0 deletions _aggregations/bucket/subaggregations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
layout: default
title: Subaggregations
parent: Composite
grand_parent: Bucket aggregations
great_grand_parent: Aggregations
nav_order: 30
---
Loading
Loading