Skip to content

Commit 69d8c12

Browse files
authored
Merge pull request #2 from hydrator/fix/fix-resources
Fix/fix resources
2 parents b3f4ce2 + 46d6a44 commit 69d8c12

File tree

3 files changed

+146
-0
lines changed

3 files changed

+146
-0
lines changed

docs/Elasticsearch-batchsink.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# Elasticsearch Batch Sink
2+
3+
4+
Description
5+
-----------
6+
Takes the Structured Record from the input source and converts it to a JSON string, then indexes it in
7+
Elasticsearch using the index, type, and idField specified by the user. The Elasticsearch server should
8+
be running prior to creating the application.
9+
10+
This sink is used whenever you need to write to an Elasticsearch server. For example, you
11+
may want to parse a file and read its contents into Elasticsearch, which you can achieve
12+
with a stream batch source and Elasticsearch as a sink.
13+
14+
15+
Configuration
16+
-------------
17+
**referenceName:** This will be used to uniquely identify this sink for lineage, annotating metadata, etc.
18+
19+
**es.host:** The hostname and port for the Elasticsearch instance. (Macro-enabled)
20+
21+
**es.index:** The name of the index where the data will be stored; if the index does not
22+
already exist, it will be created using Elasticsearch's default properties. (Macro-enabled)
23+
24+
**es.type:** The name of the type where the data will be stored; if it does not already
25+
exist, it will be created. (Macro-enabled)
26+
27+
**es.idField:** The field that will determine the id for the document; it should match a fieldname
28+
in the Structured Record of the input. (Macro-enabled)
29+
30+
31+
Example
32+
-------
33+
This example connects to Elasticsearch, which is running locally, and writes the data to
34+
the specified index (megacorp) and type (employee). The data is indexed using the id field
35+
in the record. Each run, the documents will be updated if they are still present in the source:
36+
37+
{
38+
"name": "Elasticsearch",
39+
"type": "batchsink",
40+
"properties": {
41+
"es.host": "localhost:9200",
42+
"es.index": "megacorp",
43+
"es.type": "employee",
44+
"es.idField": "id"
45+
}
46+
}

docs/Elasticsearch-batchsource.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Elasticsearch Batch Source
2+
3+
4+
Description
5+
-----------
6+
Pulls documents from Elasticsearch according to the query specified by the user and converts each document
7+
to a Structured Record with the fields and schema specified by the user. The Elasticsearch server should
8+
be running prior to creating the application.
9+
10+
This source is used whenever you need to read data from Elasticsearch. For example, you may want to read
11+
in an index and type from Elasticsearch and store the data in an HBase table.
12+
13+
14+
Configuration
15+
-------------
16+
**referenceName:** This will be used to uniquely identify this source for lineage, annotating metadata, etc.
17+
18+
**es.host:** The hostname and port for the Elasticsearch instance. (Macro-enabled)
19+
20+
**es.index:** The name of the index to query. (Macro-enabled)
21+
22+
**es.type:** The name of the type where the data is stored. (Macro-enabled)
23+
24+
**query:** The query to use to import data from the specified index and type;
25+
see Elasticsearch for additional query examples. (Macro-enabled)
26+
27+
**schema:** The schema or mapping of the data in Elasticsearch.
28+
29+
30+
Example
31+
-------
32+
This example connects to Elasticsearch, which is running locally, and reads in records in the
33+
specified index (*megacorp*) and type (*employee*) which match the query to (in this case) select all records.
34+
All data from the index will be read on each run:
35+
36+
{
37+
"name": "Elasticsearch",
38+
"type": "batchsource",
39+
"properties": {
40+
"es.host": "localhost:9200",
41+
"es.index": "megacorp",
42+
"es.type": "employee",
43+
"query": "?q=*",
44+
"schema": "{
45+
\"type\":\"record\",
46+
\"name\":\"etlSchemaBody\",
47+
\"fields\":[
48+
{\"name\":\"id\",\"type\":\"long\"},
49+
{\"name\":\"name\",\"type\":\"string\"},
50+
{\"name\":\"age\",\"type\":\"int\"}]}"
51+
}
52+
}

docs/Elasticsearch-realtimesink.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# Elasticsearch Real-time Sink
2+
3+
4+
Description
5+
-----------
6+
Takes the Structured Record from the input source and converts it to a JSON string, then indexes it in
7+
Elasticsearch using the index, type, and idField specified by the user. The Elasticsearch server should
8+
be running prior to creating the application.
9+
10+
This sink is used whenever you need to write data into Elasticsearch.
11+
For example, you may want to read Kafka logs and store them in Elasticsearch
12+
to be able to search on them.
13+
14+
15+
Configuration
16+
-------------
17+
**referenceName:** This will be used to uniquely identify this sink for lineage, annotating metadata, etc.
18+
19+
**es.cluster:** The name of the cluster to connect to; defaults to ``'elasticsearch'``.
20+
21+
**es.transportAddresses:** The addresses for nodes; specify the address for at least one node,
22+
and separate others by commas; other nodes will be sniffed out.
23+
24+
**es.index:** The name of the index where the data will be stored; if the index does not already exist,
25+
it will be created using Elasticsearch's default properties.
26+
27+
**es.type:** The name of the type where the data will be stored; if it does not already exist, it will be created.
28+
29+
**es.idField:** The field that will determine the id for the document; it should match a fieldname in the
30+
Structured Record of the input; if left blank, Elasticsearch will create a unique id for each document.
31+
32+
33+
Example
34+
--------
35+
This example connects to Elasticsearch, which is running locally, and writes the data to
36+
the specified index (*logs*) and type (*cdap*). The data is indexed using the timestamp (*ts*) field
37+
in the record:
38+
39+
{
40+
"name": "Elasticsearch",
41+
"type": "batchsink",
42+
"properties": {
43+
"es.transportAddresses": "localhost:9300",
44+
"es.index": "logs",
45+
"es.type": "cdap",
46+
"es.idField": "ts"
47+
}
48+
}

0 commit comments

Comments
 (0)