TEIID-4594 initial parquet docs

shawkins · shawkins · commit d429714011ec · 2020-09-04T10:37:58.000-04:00
diff --git a/admin/Migration_Guide_From_Teiid_15.x.adoc b/admin/Migration_Guide_From_Teiid_15.x.adoc
@@ -0,0 +1,14 @@
+
+= Migration Guide From {{ book.productnameFull }} 15.x to 16.x
+
+{{ book.productnameFull }} strives to maintain consistency between all versions, but when necessary breaking configuration and VDB/sql changes are made - and then typically only for major releases. 
+
+You should consult the release notes for compatibility and configuration changes from each minor version that you are upgrading over.  This guide expands upon the release notes included in the kit to cover changes since 15.x.
+
+If possible you should make your migration to {{ book.productnameFull }} 15 by first using {{ book.productnameFull }} 15.0.x.  {{ book.productnameFull }} 16 requires Java 8 and WildFly 19.1 (the same as Teiid 15).  See also link:Migration_Guide_From_Teiid_14.x.adoc[14 to 15 Migration Guide]
+
+== Configuration Changes
+
+== Compatibility Changes
+
+* https://issues.redhat.com/browse/TEIID-6025[TEIID-6025] The file translator path and pattern will no longer default to initially attempting a literal match.  Instead it will always match against a simplified glob syntax.  Literal matches against the \* character now require it to be escaped as \*\*
diff --git a/reference/Release_Notes.adoc b/reference/Release_Notes.adoc
@@ -3,12 +3,12 @@
 :toc-placement: preamble
 :toc-title: Release Notes
 
-Teiid {{ book.fullVersionNumber }} adds performance features, microservice enablement, and fixes.
+Teiid {{ book.fullVersionNumber }} adds connectivity/performance features and fixes.
 
 == Highlights
 
 * https://issues.redhat.com/browse/TEIID-6021[TEIID-6021] Added bulk update and delete handling for Salesforce.  Bulk update can be controlled via a hint or now by execution properties.  Bulk delete behavior can also be specified as hard delete via a hint or execution property.
-* https://issues.redhat.com/browse/TEIID-4594[TEIID-4594] Thanks to Aditya Manglam Sharma we've added a parquet translator for use with file sources - namely file and s3.
+* https://issues.redhat.com/browse/TEIID-4594[TEIID-4594] Thanks to Aditya Manglam Sharma we've added a parquet translator for use with file sources - namely file, hdfs, and s3.
 
 == Compatibility Issues
 
diff --git a/reference/r_couchbase-translator.adoc b/reference/r_couchbase-translator.adoc
@@ -3,7 +3,7 @@
 :toc: manual
 :toc-placement: preamble
 
-The Couchbase Translator, known by the type name _couchbase_, exposes querying functionality to link:../admin/Couchbase_Data_Sources.adoc[Couchbase Data Sources]. The Couchbase Translator provide a SQL Integration solution for integrating Couchbase JSON document with relational model, which allows applications to use normal SQL queries against Couchbase Server, translating standard SQL-92 queries into equivalent N1QL client API calls. The translator translates {{ book.productnameFull }} push down commands into https://developer.couchbase.com/documentation/server/4.5/n1ql/n1ql-language-reference/index.html[Couchbase N1QL].
+The Couchbase Translator, known by the type name _couchbase_, provides a SQL Integration solution for integrating Couchbase JSON document with relational model, which allows applications to use normal SQL queries against Couchbase Server, translating standard SQL-92 queries into equivalent N1QL client API calls. The translator translates {{ book.productnameFull }} push down commands into https://developer.couchbase.com/documentation/server/4.5/n1ql/n1ql-language-reference/index.html[Couchbase N1QL].
 
 == Usage
 
@@ -99,7 +99,7 @@ the dimension 4 nested array coulmn must define a NAMEINSOURCE with value `trave
 
 ==== Importer Properties 
 
-To ensure consistent support for your Couchbase data, use the importer properties to do futher defining in shcema generation.
+To ensure consistent support for your Couchbase data, use the importer properties to do further defining in schema generation.
 
 [source,xml]
 .*An example of importer properties*
@@ -130,7 +130,7 @@ To ensure consistent support for your Couchbase data, use the importer propertie
 `KEYSPACE`:`ATTRIBUTE`,`KEYSPACE`:`ATTRIBUTE`,`KEYSPACE`:`ATTRIBUTE`
 ----
 * KEYSPACE - the keyspaces must be under same namespace it either can be different one, or are same one. 
-* ATTRIBUTE - the attribute must be non object/array, resident on the root of keyspace, and it's type should be equivalent String. If a typeNameList set a specifc bucket(keyspace) has multiple types, and a document has all these types, the first one will be chose.
+* ATTRIBUTE - the attribute must be non object/array, resident on the root of keyspace, and it's type should be equivalent String. If a typeNameList set a specific bucket(keyspace) has multiple types, and a document has all these types, the first one will be chose.
 
 For example, the TypeNameList below indicates that the buckets(keyspaces) test, default, and beer-sample use the type attribute to specify the type of each document, during schema generation, all type referenced value will be treated as table name.
 ----
@@ -280,5 +280,3 @@ getDocument(id, keyspace)
 ----
 call getDocument('customer-1', 'test')
 ----
-
-
diff --git a/reference/r_odata-v4-translator.adoc b/reference/r_odata-v4-translator.adoc
@@ -182,7 +182,7 @@ You can leave this property undefined. If the translator does not detect a confi
 it specifies the default name of the EntityContainer.
 
 {% if book.targetWildfly %}
-.JCA resource adapter
+== JCA resource adapter
 
 The resource adapter for this translator is a link:../admin/Web_Service_Data_Sources.adoc[Web Service Data Source].
 {% endif %}
diff --git a/reference/r_parquet-translator.adoc b/reference/r_parquet-translator.adoc
@@ -0,0 +1,33 @@
+
+= Parquet Translator
+:toc: manual
+:toc-placement: preamble
+
+The Parquet Translator, known by the type name _parquet_, exposes querying functionality to file data sources. The translator will convert pushdown SQL into filtering logic for the Apache Parquet API.  Column projection will be minimized, unnecessary file reads ignored, and row level filters applied before results are handed back to the engine.
+
+== Usage
+
+The Parquet Translator supports SELECT involving only simple WHERE clauses consisting of comparisons and is null checks.
+
+The type mapping of Parquet Primitive types are as follows: INT96 -> BigInteger, INT64 -> Long, INT32 -> Integer, Boolean -> Boolean, FLOAT -> Float, DOUBLE -> Double, Binary(of String Logical annotation) -> String, Binary(except String) -> byte array,  FIXED_LEN_BYTE_ARRAY -> byte array.  The only logical type that we are supporting right now is a list of primitive types with no nesting.
+
+
+{% if book.targetWildfly %}
+== JCA Resource Adapter
+
+A {{ book.productnameFull }} File Resource Adapter should be used with this translator. See link:../admin/File_Data_Sources.adoc[File Data Sources], link:../admin/HDFS_Data_Sources.adoc[HDFS Data Sources], and link:../admin/S3_Data_Sources.adoc[S3 Data Sources].  Note that while the FTP data source is also a file source, it does not yet support wildcard matches, so it may only be used if there is no partitioning and all of the parquet files are in a single directory. 
+{% endif %}
+
+== Execution Properties
+There are currently no properties to set.
+
+== Schema Definition
+
+* Only String, BigInteger, Long, Integer, Boolean primitive types are properly supported for comparison of partitioned column values.  Other value types may not appropriately convert to/from the expected string representation.
+* Only Primitive types (String, BinaryType, Long, Integer, Boolean, Double, and Float) are supported for comparison of non-partitioned columns.  Thus array columns should have the option comparable 'uncomparable' as the translator cannot currently compare them.
+* The teiid_parquet:LOCATION property specifies the root location for the parquet files - it may be a single file, or a directory.
+* The teiid_parquet:PARTITIONED_COLUMNS property is a comma separated list of source names for partitions.  Only directory-based partitioning is supported.  We assume that a partitioned column is represented in the name of a file path in the manner “PartitionedColumnName=Value”.  The property is expected to be in the order that the directories appear in.  Commas are not supported in partitioned column source names and leading/trailing whitespace will be treated as meaningful.  This property is not yet utilized by the engine, but there will likely be a generalization at some point to allow cost based optimization to understand the partitioning.
+
+== Limitations
+
+* The parquet files are expected to be stored with the .parquet extension and may be snappy compressed.
diff --git a/wildfly/SUMMARY.adoc b/wildfly/SUMMARY.adoc
@@ -307,6 +307,7 @@
 .... link:reference/r_netezza-translator.adoc[Netezza translator]
 .... link:reference/r_oracle-translator.adoc[Oracle translator]
 .... link:reference/r_osisoft-pi-translator.adoc[OSISoft PI translator]
+.... link:reference/r_parquet-translator.adoc[Parquet translator]
 .... link:reference/r_postgresql-translator.adoc[PostgreSQL translator]
 .... link:reference/r_prestodb-translator.adoc[PrestoDB translator]
 .... link:reference/r_redshift-translator.adoc[Redshift translator]