Merge pull request #2 from hydrator/fix/fix-resources

CuriousVini · web-flow · commit 69d8c12ec112 · 2017-03-23T16:08:25.000-07:00
Fix/fix resources
diff --git a/docs/Elasticsearch-batchsink.md b/docs/Elasticsearch-batchsink.md
@@ -0,0 +1,46 @@
+# Elasticsearch Batch Sink
+
+
+Description
+-----------
+Takes the Structured Record from the input source and converts it to a JSON string, then indexes it in
+Elasticsearch using the index, type, and idField specified by the user. The Elasticsearch server should
+be running prior to creating the application.
+
+This sink is used whenever you need to write to an Elasticsearch server. For example, you
+may want to parse a file and read its contents into Elasticsearch, which you can achieve
+with a stream batch source and Elasticsearch as a sink.
+
+
+Configuration
+-------------
+**referenceName:** This will be used to uniquely identify this sink for lineage, annotating metadata, etc.
+
+**es.host:** The hostname and port for the Elasticsearch instance. (Macro-enabled)
+
+**es.index:** The name of the index where the data will be stored; if the index does not
+already exist, it will be created using Elasticsearch's default properties. (Macro-enabled)
+
+**es.type:** The name of the type where the data will be stored; if it does not already
+exist, it will be created. (Macro-enabled)
+
+**es.idField:** The field that will determine the id for the document; it should match a fieldname
+in the Structured Record of the input. (Macro-enabled)
+
+
+Example
+-------
+This example connects to Elasticsearch, which is running locally, and writes the data to
+the specified index (megacorp) and type (employee). The data is indexed using the id field
+in the record. Each run, the documents will be updated if they are still present in the source:
+
+    {
+        "name": "Elasticsearch",
+        "type": "batchsink",
+        "properties": {
+            "es.host": "localhost:9200",
+            "es.index": "megacorp",
+            "es.type": "employee",
+            "es.idField": "id"
+        }
+    }
diff --git a/docs/Elasticsearch-batchsource.md b/docs/Elasticsearch-batchsource.md
@@ -0,0 +1,52 @@
+# Elasticsearch Batch Source
+
+
+Description
+-----------
+Pulls documents from Elasticsearch according to the query specified by the user and converts each document
+to a Structured Record with the fields and schema specified by the user. The Elasticsearch server should
+be running prior to creating the application.
+
+This source is used whenever you need to read data from Elasticsearch. For example, you may want to read
+in an index and type from Elasticsearch and store the data in an HBase table.
+
+
+Configuration
+-------------
+**referenceName:** This will be used to uniquely identify this source for lineage, annotating metadata, etc.
+
+**es.host:** The hostname and port for the Elasticsearch instance. (Macro-enabled)
+
+**es.index:** The name of the index to query. (Macro-enabled)
+
+**es.type:** The name of the type where the data is stored. (Macro-enabled)
+
+**query:** The query to use to import data from the specified index and type;
+see Elasticsearch for additional query examples. (Macro-enabled)
+
+**schema:** The schema or mapping of the data in Elasticsearch.
+
+
+Example
+-------
+This example connects to Elasticsearch, which is running locally, and reads in records in the
+specified index (*megacorp*) and type (*employee*) which match the query to (in this case) select all records.
+All data from the index will be read on each run:
+
+    {
+        "name": "Elasticsearch",
+        "type": "batchsource",
+        "properties": {
+            "es.host": "localhost:9200",
+            "es.index": "megacorp",
+            "es.type": "employee",
+            "query": "?q=*",
+            "schema": "{
+                \"type\":\"record\",
+                \"name\":\"etlSchemaBody\",
+                \"fields\":[
+                  {\"name\":\"id\",\"type\":\"long\"},
+                  {\"name\":\"name\",\"type\":\"string\"},
+                  {\"name\":\"age\",\"type\":\"int\"}]}"
+        }
+    }
diff --git a/docs/Elasticsearch-realtimesink.md b/docs/Elasticsearch-realtimesink.md
@@ -0,0 +1,48 @@
+# Elasticsearch Real-time Sink
+
+
+Description
+-----------
+Takes the Structured Record from the input source and converts it to a JSON string, then indexes it in
+Elasticsearch using the index, type, and idField specified by the user. The Elasticsearch server should
+be running prior to creating the application.
+
+This sink is used whenever you need to write data into Elasticsearch.
+For example, you may want to read Kafka logs and store them in Elasticsearch
+to be able to search on them.
+
+
+Configuration
+-------------
+**referenceName:** This will be used to uniquely identify this sink for lineage, annotating metadata, etc.
+
+**es.cluster:** The name of the cluster to connect to; defaults to ``'elasticsearch'``.
+
+**es.transportAddresses:** The addresses for nodes; specify the address for at least one node,
+and separate others by commas; other nodes will be sniffed out.
+
+**es.index:** The name of the index where the data will be stored; if the index does not already exist,
+it will be created using Elasticsearch's default properties.
+
+**es.type:** The name of the type where the data will be stored; if it does not already exist, it will be created.
+
+**es.idField:** The field that will determine the id for the document; it should match a fieldname in the
+Structured Record of the input; if left blank, Elasticsearch will create a unique id for each document.
+
+
+Example
+--------
+This example connects to Elasticsearch, which is running locally, and writes the data to
+the specified index (*logs*) and type (*cdap*). The data is indexed using the timestamp (*ts*) field
+in the record:
+
+    {
+        "name": "Elasticsearch",
+        "type": "batchsink",
+        "properties": {
+            "es.transportAddresses": "localhost:9300",
+            "es.index": "logs",
+            "es.type": "cdap",
+            "es.idField": "ts"
+        }
+    }