Welcome to the Elasticsearch Beyonder project.
This project comes historically from spring-elasticsearch project.
The goal of this project is to provide a simple Java library which helps to create indices, mappings, etc. when you start your application.
elasticsearch-beyonder | elasticsearch | Release date |
---|---|---|
9.0-SNAPSHOT | 8.x | |
8.17 | 8.x | 2025-03-06 |
7.16 | 7.x | 2022-01-13 |
7.15 | 7.x | 2021-10-14 |
7.13.2 | 7.x | 2021-07-22 |
7.13.1 | 7.x | 2021-06-21 |
7.13 | 7.x | 2021-06-03 |
7.5 | 7.x | 2020-01-15 |
7.0 | 7.0 -> 7.x | 2019-04-04 |
6.5 | 6.5 -> 6.x | 2019-01-04 |
6.3 | 6.3 -> 6.4 | 2018-07-21 |
6.0 | 6.0 -> 6.2 | 2018-02-05 |
5.1 | 5.x, 6.x | 2017-07-12 |
5.0 | 5.x, 6.x | 2017-07-11 |
2.1.0 | 2.0, 2.1 | 2015-11-25 |
2.0.0 | 2.0 | 2015-10-24 |
1.5.0 | 1.5 | 2015-03-27 |
1.4.1 | 1.4 | 2015-03-02 |
1.4.0 | 1.4 | 2015-02-27 |
- For 9.x elasticsearch versions, you are reading the latest documentation.
- For 8.x elasticsearch versions, look at es-8.x branch.
- For 7.x elasticsearch versions, look at es-7.x branch.
- For 6.x elasticsearch versions, look at es-6.x branch.
- For 5.x elasticsearch versions, look at es-5.x branch.
- For 2.x elasticsearch versions, look at es-2.1 branch.
- Update project to Elasticsearch 9.0.0-SNAPSHOT.
- Update required JVM to Java 17
- Update project to Elasticsearch 8.17.2.
- Remove the deprecated Transport Client
_pipeline
dir is not supported anymore. Use_pipelines
dir._template
and_templates
dir are not supported anymore. Use_index_templates
and_component_templates
dirs.- method
start(RestClient client, String root, boolean merge, boolean force)
is nowstart(RestClient client, String root, boolean force)
. - support for sample dataset has been added. If we detect a directory named
_data
in the classpath, we will try to load sample data from it. We support bothndjson
files which will be loaded using the Bulk API andjson
files which will loaded using the Index API.
- Update Log4J (optional) dependency to 2.17.1.
- Add support for Index Lifecycles.
- Added back support for Java 8
_pipeline
dir has been deprecated by_pipelines
dir._template
dir has been deprecated by_templates
dir.force
parameter is not applied anymore to pipelines. So pipelines are always updated.force
parameter is not applied anymore to templates, component templates and index templates. So they are always updated.- method
start(RestClient client, String root, boolean merge, boolean force)
is now deprecated as themerge
parameter is not used anymore. Use instead thestart(RestClient client, String root, boolean force)
method. - support for the aliases API has been added.
Import elasticsearch-beyonder in you project pom.xml
file:
<dependency>
<groupId>fr.pilato.elasticsearch</groupId>
<artifactId>elasticsearch-beyonder</artifactId>
<version>9.0-SNAPSHOT</version>
</dependency>
You need to import as well the elasticsearch client you want to use by adding one of the following
dependencies to your pom.xml
file.
For example, here is how to import the REST Client to your project:
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-client</artifactId>
<version>9.0.0-SNAPSHOT</version>
</dependency>
For example, here is how to import the Transport Client to your project (deprecated):
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>transport</artifactId>
<version>9.0.0-SNAPSHOT</version>
</dependency>
If you are using a SNAPSHOT version of elasticsearch-beyonder, you need to add the Sonatype repository to your pom.xml
file:
<repositories>
<repository>
<name>Central Portal Snapshots</name>
<id>central-portal-snapshots</id>
<url>https://central.sonatype.com/repository/maven-snapshots/</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
Elasticsearch provides a Low Level Rest Client. You can create it like this:
RestClient client = RestClient.builder(HttpHost.create("http://127.0.0.1:9200")).build();
Once you have the client, you can use it to manage automatic creation of index, mappings, templates and aliases. To activate those features, you only need to pass to Beyonder the Rest Client instance:
ElasticsearchBeyonder.start(client);
By default, Beyonder will try to locate resources from elasticsearch
directory within your classpath.
We will use this default value for the rest of the documentation.
But you can change this using:
ElasticsearchBeyonder.start(client, "models/myelasticsearch");
In that case, Beyonder will search for resources from models/myelasticsearch
.
There is also a more complete version of the start
method with:
ElasticsearchBeyonder.start(client, "models/myelasticsearch", true);
This last parameter is known as force
. It removes any existing index which is managed by Beyonder.
It is super useful for integration testing, but it is super dangerous in production.
For the record, when your cluster is secured, you can use for example the Basic Authentication:
CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials("elastic", "changeme"));
RestClient client = RestClient.builder(HttpHost.create("http://127.0.0.1:9200"))
.setHttpClientConfigCallback(hcb -> hcb.setDefaultCredentialsProvider(credentialsProvider)).build();
ElasticsearchBeyonder.start(client);
When Beyonder starts, it tries to find index names and settings in the classpath.
If you add in your classpath a file named elasticsearch/twitter
, the twitter
index will be automatically created
at startup if it does not exist yet.
If you add in your classpath a file named elasticsearch/twitter/_settings.json
, it will be automatically applied to define
settings for your twitter
index.
For example, create the following file src/main/resources/elasticsearch/twitter/_settings.json
in your project:
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"message": { "type": "text" },
"foo": { "type": "text" }
}
}
}
By default, Beyonder will not overwrite an index if it already exists.
This can be overridden by setting force
to true
in the expanded factory method
ElasticsearchBeyonder.start()
.
You can also provide a file named _update_settings.json
to update your index settings
and a file named _update_mapping.json
if you want to update an existing mapping.
Note that Elasticsearch do not allow updating all settings and mappings.
You can for example add a new field, or change the search_analyzer
for a given field, but you can not modify
the field type
.
Considering the previous example we saw, you can create a elasticsearch/twitter/_update_settings.json
to update the
number of replicas:
{
"number_of_replicas" : 1
}
And you can create elasticsearch/twitter/_update_mapping.json
:
{
"properties": {
"message" : {"type" : "text", "search_analyzer": "keyword" },
"bar" : { "type" : "text" }
}
}
This will change the search_analyzer
for the message
field and will add a new field named bar
.
All other existing fields (like foo
in the previous example) won't be changed.
If you would like to use math expressions for the index name,
you can use the URI encoded version of the expression.
For example, the following directory structure will end up creating an index named my-index-{now/d}
:
.
└── %3Cmy-index-%7Bnow%2Fd%7D%3E
├── _data
│ └── bulk.ndjson
└── _settings.json
An alias is helpful to define or remove an alias to a given index. You could also use index templates to do that
automatically when the index is created, but you can also define a file elasticsearch/_aliases.json
:
{
"actions" : [
{ "remove": { "index": "test_1", "alias": "test" } },
{ "add": { "index": "test_2", "alias": "test" } }
]
}
When Beyonder starts, it will automatically send the content to the Aliases API.
Since version 7.13, the new index template management API is supported. It allows to define both component templates and index templates.
To define component templates, you can create json files within the elasticsearch/_component_templates/
dir.
Let's first create a elasticsearch/_component_templates/component1.json
:
{
"template": {
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
}
}
}
}
}
Then a second component template as elasticsearch/_component_templates/component2.json
:
{
"template": {
"mappings": {
"runtime": {
"day_of_week": {
"type": "keyword",
"script": {
"source": "emit(doc['@timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
}
}
}
}
}
}
When Beyonder starts, it will create 2 component templates into elasticsearch, named respectively component1
and component2
.
To define index templates, you can create json files within the elasticsearch/_index_templates/
dir.
Let's create a elasticsearch/_index_templates/template_1.json
:
{
"index_patterns": ["te*", "bar*"],
"template": {
"settings": {
"number_of_shards": 1
},
"mappings": {
"_source": {
"enabled": true
},
"properties": {
"host_name": {
"type": "keyword"
},
"created_at": {
"type": "date",
"format": "EEE MMM dd HH:mm:ss Z yyyy"
}
}
},
"aliases": {
"mydata": { }
}
},
"priority": 500,
"composed_of": ["component1", "component2"],
"version": 3,
"_meta": {
"description": "my custom"
}
}
When Beyonder starts, it will create the index templates named template_1
into elasticsearch.
Note that this index template references 2 component templates that must be available before Beyonder starts
or defined within the component_templates
dir as we saw just before.
A pipeline is a definition of a series of processors that are to be executed in the same order as they are declared while documents are being indexed. Please note that this feature is only supported when you use the REST client not the Transport client.
For example, setting one fields value based on another field by using a Set Processor you can add a file named elasticsearch/_pipelines/set_field_processor.json
in your project:
{
"description" : "Twitter pipeline",
"processors" : [
{
"set" : {
"field": "copy",
"value": "{{otherField}}"
}
}
]
}
To define an index lifecycle,
you can create json files within the elasticsearch/_index_lifecycles/
dir.
Let's create a elasticsearch/_index_lifecycles/my_lifecycle.json
:
{
"policy": {
"phases": {
"warm": {
"min_age": "10d",
"actions": {
"forcemerge": {
"max_num_segments": 1
}
}
},
"delete": {
"min_age": "30d",
"actions": {
"delete": {}
}
}
}
}
}
When Beyonder starts, it will create the index templates named my_lifecycle
into elasticsearch.
If you want to load sample data into your cluster, you can create a directory named _data
in your classpath.
We support both ndjson
files which will be loaded using the Bulk API (recommended) and json
files which will loaded
using the Index API (slower).
If the _data
directory is created within an index directory, the data will be loaded only for this index, meaning
that you don't need to define the index name in the bulk headers.
For example, let's assume that you have the following files under your elasticsearch
directory:
.
├── _data
│ ├── bulk-001.ndjson
│ └── bulk-002.ndjson
├── person
│ ├── _data
│ │ ├── doc001.json
│ │ ├── doc002.json
│ │ ├── doc003.json
│ │ └── doc004.json
│ └── _index_templates
│ └── person.json
├── test_1
│ ├── _data
│ │ ├── bulk-001.ndjson
│ │ └── bulk-002.ndjson
│ └── _settings.json
├── test_2
│ ├── _data
│ │ └── abcd.ndjson
│ └── _settings.json
└── twitter
└── _settings.json
The _data/bulk-001.ndjson
file contains:
{ "index" : { "_index" : "twitter" } }
{ "message" : "message 1" }
{ "index" : { "_index" : "twitter" } }
{ "message" : "message 2" }
{ "index" : { "_index" : "twitter" } }
{ "message" : "message 3" }
{ "index" : { "_index" : "twitter" } }
{ "message" : "message 4" }
{ "index" : { "_index" : "twitter" } }
{ "message" : "message 5" }
The person/_data/doc001.json
file contains something like:
{
"name": "John Doe"
}
Note that it contains no header and could be pretty formatted unlike the ndjson
format.
JSon documents can be only added within a given index directory and not at the root level in the _data
directory.
The test_1/_data/bulk-001.ndjson
file contains:
{ "index" : { } }
{ "message" : "message 1" }
{ "index" : { } }
{ "message" : "message 2" }
{ "index" : { } }
{ "message" : "message 3" }
{ "index" : { } }
{ "message" : "message 4" }
{ "index" : { } }
{ "message" : "message 5" }
And the test_2/_data/abcd.ndjson
file contains:
{ "index" : { } }
{ "message" : "message 1" }
{ "index" : { } }
{ "message" : "message 2" }
{ "index" : { } }
{ "message" : "message 3" }
{ "index" : { } }
{ "message" : "message 4" }
{ "index" : { } }
{ "message" : "message 5" }
When Beyonder starts, it will:
- Create the
person
index with the mapping defined inelasticsearch/person/_index_templates/person.json
. - Create the
test_1
index with the settings defined inelasticsearch/test_1/_settings.json
. - Create the
test_2
index with the settings defined inelasticsearch/test_2/_settings.json
. - Create the
twitter
index with the settings defined inelasticsearch/twitter/_settings.json
. - Load the data from
elasticsearch/test_1/_data/bulk-001.ndjson
and thenelasticsearch/test_1/_data/bulk-002.ndjson
into thetest_1
index. - Load the data from
elasticsearch/test_2/_data/abcd.ndjson
into thetest_2
index. - Load the data from
elasticsearch/_data/bulk-001.ndjson
andelasticsearch/_data/bulk-002.ndjson
into the indices specified within the bulk files. - Load the data from
elasticsearch/person/_data/doc*.json
files into theperson
index.
Note that files are sorted by name before being loaded which means that a file bulk_001.ndjson
will be loaded before
bulk_002.ndjson
.
If the index already existed before Beyonder starts, the data won't be loaded unless you are using the force
option.
This does not apply to the _data
root directory which will always load the data at every startup.
This project comes with unit tests and integration tests.
You can disable running them by using skipTests
option as follows:
mvn clean install -DskipTests
If you want to disable only running unit tests, use skipUnitTests
option:
mvn clean install -DskipUnitTests
Integration tests are launching a Docker instance using TestContainers. So you need to have Docker installed.
If you want to disable running integration tests, use skipIntegTests
option:
mvn clean install -DskipIntegTests
If you wish to run integration tests against a cluster which is already running externally, you can configure the following settings to locate your cluster:
setting | default |
---|---|
tests.cluster |
https://127.0.0.1:9200 |
tests.cluster.user |
elastic |
tests.cluster.pass |
changeme |
For example:
mvn clean install -Dtests.cluster=https://127.0.0.1:9200
If you want to run your tests against an Elastic Cloud instance, you can use something like:
mvn clean install \
-Dtests.cluster=https://CLUSTERID.us-central1.gcp.cloud.es.io:443 \
-Dtests.cluster.user=elastic \
-Dtests.cluster.pass=GENERATEDPASSWORD
To release the project, you simply need to run:
./release.sh
If you want to just test the release without deploying anything, run:
DRY_RUN=1 ./release.sh
I was actually looking for a cool name in the marvel characters list and found that Beyonder was actually a very powerful character.
This project gives some features beyond elasticsearch itself. :)
To release the project, you simply need to run:
./release.sh
If you want to just test the release without deploying anything, run:
DRY_RUN=1 ./release.sh
This software is licensed under the Apache 2 license, quoted below.
Copyright 2011-2025 David Pilato
Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.