Skip to content

Commit d429714

Browse files
committedSep 4, 2020
TEIID-4594 initial parquet docs
1 parent 140d1d1 commit d429714

6 files changed

+54
-8
lines changed
 
+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
2+
= Migration Guide From {{ book.productnameFull }} 15.x to 16.x
3+
4+
{{ book.productnameFull }} strives to maintain consistency between all versions, but when necessary breaking configuration and VDB/sql changes are made - and then typically only for major releases.
5+
6+
You should consult the release notes for compatibility and configuration changes from each minor version that you are upgrading over. This guide expands upon the release notes included in the kit to cover changes since 15.x.
7+
8+
If possible you should make your migration to {{ book.productnameFull }} 15 by first using {{ book.productnameFull }} 15.0.x. {{ book.productnameFull }} 16 requires Java 8 and WildFly 19.1 (the same as Teiid 15). See also link:Migration_Guide_From_Teiid_14.x.adoc[14 to 15 Migration Guide]
9+
10+
== Configuration Changes
11+
12+
== Compatibility Changes
13+
14+
* https://issues.redhat.com/browse/TEIID-6025[TEIID-6025] The file translator path and pattern will no longer default to initially attempting a literal match. Instead it will always match against a simplified glob syntax. Literal matches against the \* character now require it to be escaped as \*\*

‎reference/Release_Notes.adoc

+2-2
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@
33
:toc-placement: preamble
44
:toc-title: Release Notes
55

6-
Teiid {{ book.fullVersionNumber }} adds performance features, microservice enablement, and fixes.
6+
Teiid {{ book.fullVersionNumber }} adds connectivity/performance features and fixes.
77

88
== Highlights
99

1010
* https://issues.redhat.com/browse/TEIID-6021[TEIID-6021] Added bulk update and delete handling for Salesforce. Bulk update can be controlled via a hint or now by execution properties. Bulk delete behavior can also be specified as hard delete via a hint or execution property.
11-
* https://issues.redhat.com/browse/TEIID-4594[TEIID-4594] Thanks to Aditya Manglam Sharma we've added a parquet translator for use with file sources - namely file and s3.
11+
* https://issues.redhat.com/browse/TEIID-4594[TEIID-4594] Thanks to Aditya Manglam Sharma we've added a parquet translator for use with file sources - namely file, hdfs, and s3.
1212

1313
== Compatibility Issues
1414

‎reference/r_couchbase-translator.adoc

+3-5
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
:toc: manual
44
:toc-placement: preamble
55

6-
The Couchbase Translator, known by the type name _couchbase_, exposes querying functionality to link:../admin/Couchbase_Data_Sources.adoc[Couchbase Data Sources]. The Couchbase Translator provide a SQL Integration solution for integrating Couchbase JSON document with relational model, which allows applications to use normal SQL queries against Couchbase Server, translating standard SQL-92 queries into equivalent N1QL client API calls. The translator translates {{ book.productnameFull }} push down commands into https://developer.couchbase.com/documentation/server/4.5/n1ql/n1ql-language-reference/index.html[Couchbase N1QL].
6+
The Couchbase Translator, known by the type name _couchbase_, provides a SQL Integration solution for integrating Couchbase JSON document with relational model, which allows applications to use normal SQL queries against Couchbase Server, translating standard SQL-92 queries into equivalent N1QL client API calls. The translator translates {{ book.productnameFull }} push down commands into https://developer.couchbase.com/documentation/server/4.5/n1ql/n1ql-language-reference/index.html[Couchbase N1QL].
77

88
== Usage
99

@@ -99,7 +99,7 @@ the dimension 4 nested array coulmn must define a NAMEINSOURCE with value `trave
9999

100100
==== Importer Properties
101101

102-
To ensure consistent support for your Couchbase data, use the importer properties to do futher defining in shcema generation.
102+
To ensure consistent support for your Couchbase data, use the importer properties to do further defining in schema generation.
103103

104104
[source,xml]
105105
.*An example of importer properties*
@@ -130,7 +130,7 @@ To ensure consistent support for your Couchbase data, use the importer propertie
130130
`KEYSPACE`:`ATTRIBUTE`,`KEYSPACE`:`ATTRIBUTE`,`KEYSPACE`:`ATTRIBUTE`
131131
----
132132
* KEYSPACE - the keyspaces must be under same namespace it either can be different one, or are same one.
133-
* ATTRIBUTE - the attribute must be non object/array, resident on the root of keyspace, and it's type should be equivalent String. If a typeNameList set a specifc bucket(keyspace) has multiple types, and a document has all these types, the first one will be chose.
133+
* ATTRIBUTE - the attribute must be non object/array, resident on the root of keyspace, and it's type should be equivalent String. If a typeNameList set a specific bucket(keyspace) has multiple types, and a document has all these types, the first one will be chose.
134134

135135
For example, the TypeNameList below indicates that the buckets(keyspaces) test, default, and beer-sample use the type attribute to specify the type of each document, during schema generation, all type referenced value will be treated as table name.
136136
----
@@ -280,5 +280,3 @@ getDocument(id, keyspace)
280280
----
281281
call getDocument('customer-1', 'test')
282282
----
283-
284-

‎reference/r_odata-v4-translator.adoc

+1-1
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ You can leave this property undefined. If the translator does not detect a confi
182182
it specifies the default name of the EntityContainer.
183183

184184
{% if book.targetWildfly %}
185-
.JCA resource adapter
185+
== JCA resource adapter
186186

187187
The resource adapter for this translator is a link:../admin/Web_Service_Data_Sources.adoc[Web Service Data Source].
188188
{% endif %}

‎reference/r_parquet-translator.adoc

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
2+
= Parquet Translator
3+
:toc: manual
4+
:toc-placement: preamble
5+
6+
The Parquet Translator, known by the type name _parquet_, exposes querying functionality to file data sources. The translator will convert pushdown SQL into filtering logic for the Apache Parquet API. Column projection will be minimized, unnecessary file reads ignored, and row level filters applied before results are handed back to the engine.
7+
8+
== Usage
9+
10+
The Parquet Translator supports SELECT involving only simple WHERE clauses consisting of comparisons and is null checks.
11+
12+
The type mapping of Parquet Primitive types are as follows: INT96 -> BigInteger, INT64 -> Long, INT32 -> Integer, Boolean -> Boolean, FLOAT -> Float, DOUBLE -> Double, Binary(of String Logical annotation) -> String, Binary(except String) -> byte array, FIXED_LEN_BYTE_ARRAY -> byte array. The only logical type that we are supporting right now is a list of primitive types with no nesting.
13+
14+
15+
{% if book.targetWildfly %}
16+
== JCA Resource Adapter
17+
18+
A {{ book.productnameFull }} File Resource Adapter should be used with this translator. See link:../admin/File_Data_Sources.adoc[File Data Sources], link:../admin/HDFS_Data_Sources.adoc[HDFS Data Sources], and link:../admin/S3_Data_Sources.adoc[S3 Data Sources]. Note that while the FTP data source is also a file source, it does not yet support wildcard matches, so it may only be used if there is no partitioning and all of the parquet files are in a single directory.
19+
{% endif %}
20+
21+
== Execution Properties
22+
There are currently no properties to set.
23+
24+
== Schema Definition
25+
26+
* Only String, BigInteger, Long, Integer, Boolean primitive types are properly supported for comparison of partitioned column values. Other value types may not appropriately convert to/from the expected string representation.
27+
* Only Primitive types (String, BinaryType, Long, Integer, Boolean, Double, and Float) are supported for comparison of non-partitioned columns. Thus array columns should have the option comparable 'uncomparable' as the translator cannot currently compare them.
28+
* The teiid_parquet:LOCATION property specifies the root location for the parquet files - it may be a single file, or a directory.
29+
* The teiid_parquet:PARTITIONED_COLUMNS property is a comma separated list of source names for partitions. Only directory-based partitioning is supported. We assume that a partitioned column is represented in the name of a file path in the manner “PartitionedColumnName=Value”. The property is expected to be in the order that the directories appear in. Commas are not supported in partitioned column source names and leading/trailing whitespace will be treated as meaningful. This property is not yet utilized by the engine, but there will likely be a generalization at some point to allow cost based optimization to understand the partitioning.
30+
31+
== Limitations
32+
33+
* The parquet files are expected to be stored with the .parquet extension and may be snappy compressed.

‎wildfly/SUMMARY.adoc

+1
Original file line numberDiff line numberDiff line change
@@ -307,6 +307,7 @@
307307
.... link:reference/r_netezza-translator.adoc[Netezza translator]
308308
.... link:reference/r_oracle-translator.adoc[Oracle translator]
309309
.... link:reference/r_osisoft-pi-translator.adoc[OSISoft PI translator]
310+
.... link:reference/r_parquet-translator.adoc[Parquet translator]
310311
.... link:reference/r_postgresql-translator.adoc[PostgreSQL translator]
311312
.... link:reference/r_prestodb-translator.adoc[PrestoDB translator]
312313
.... link:reference/r_redshift-translator.adoc[Redshift translator]

0 commit comments

Comments
 (0)
Please sign in to comment.