Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-30702] Add Elasticsearch dialect #67

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions docs/content/docs/connectors/datastream/jdbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -334,3 +334,15 @@ Still not supported in Python API.
Please also take Oracle connection pooling into account.

Please refer to the `JdbcXaSinkFunction` documentation for more details.

## License of JDBC driver for Elasticsearch

Flink's JDBC connector defines a Maven dependency on the "JDBC driver for Elasticsearch", which is licensed under
the Elastic License 2.0.

Flink itself neither reuses source code from the "JDBC driver for Elasticsearch"
nor packages binaries from the "JDBC driver for Elasticsearch".

Users that create and publish derivative work based on Flink's JDBC connector (thereby re-distributing
the "JDBC driver for Elasticsearch") must be aware that this may be subject to conditions declared in
the Elastic License 2.0.
64 changes: 51 additions & 13 deletions docs/content/docs/connectors/table/jdbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,18 +45,18 @@ See how to link with it for cluster execution [here]({{< ref "docs/dev/configura

A driver dependency is also required to connect to a specified database. Here are drivers currently supported:

| Driver | Group Id | Artifact Id | JAR |
|:-----------|:---------------------------|:-----------------------|:----------------------------------------------------------------------------------------------------------------------------------|
| MySQL | `mysql` | `mysql-connector-java` | [Download](https://repo.maven.apache.org/maven2/mysql/mysql-connector-java/) |
| Oracle | `com.oracle.database.jdbc` | `ojdbc8` | [Download](https://mvnrepository.com/artifact/com.oracle.database.jdbc/ojdbc8) |
| PostgreSQL | `org.postgresql` | `postgresql` | [Download](https://jdbc.postgresql.org/download/) |
| Derby | `org.apache.derby` | `derby` | [Download](http://db.apache.org/derby/derby_downloads.html) |
| SQL Server | `com.microsoft.sqlserver` | `mssql-jdbc` | [Download](https://docs.microsoft.com/en-us/sql/connect/jdbc/download-microsoft-jdbc-driver-for-sql-server?view=sql-server-ver16) |
| CrateDB | `io.crate` | `crate-jdbc` | [Download](https://repo1.maven.org/maven2/io/crate/crate-jdbc/) |
| Db2 | `com.ibm.db2.jcc` | `db2jcc` | [Download](https://www.ibm.com/support/pages/download-db2-fix-packs-version-db2-linux-unix-and-windows) |
| Trino | `io.trino` | `trino-jdbc` | [Download](https://repo1.maven.org/maven2/io/trino/trino-jdbc/) |
| OceanBase | `com.oceanbase` | `oceanbase-client` | [Download](https://repo1.maven.org/maven2/com/oceanbase/oceanbase-client/) |

| Driver | Group Id | Artifact Id | JAR |
|:--------------|:---------------------------|:-----------------------|:----------------------------------------------------------------------------------------------------------------------------------|
| MySQL | `mysql` | `mysql-connector-java` | [Download](https://repo.maven.apache.org/maven2/mysql/mysql-connector-java/) |
| Oracle | `com.oracle.database.jdbc` | `ojdbc8` | [Download](https://mvnrepository.com/artifact/com.oracle.database.jdbc/ojdbc8) |
| PostgreSQL | `org.postgresql` | `postgresql` | [Download](https://jdbc.postgresql.org/download/) |
| Derby | `org.apache.derby` | `derby` | [Download](http://db.apache.org/derby/derby_downloads.html) |
| SQL Server | `com.microsoft.sqlserver` | `mssql-jdbc` | [Download](https://docs.microsoft.com/en-us/sql/connect/jdbc/download-microsoft-jdbc-driver-for-sql-server?view=sql-server-ver16) |
| CrateDB | `io.crate` | `crate-jdbc` | [Download](https://repo1.maven.org/maven2/io/crate/crate-jdbc/) |
| Db2 | `com.ibm.db2.jcc` | `db2jcc` | [Download](https://www.ibm.com/support/pages/download-db2-fix-packs-version-db2-linux-unix-and-windows) |
| Trino | `io.trino` | `trino-jdbc` | [Download](https://repo1.maven.org/maven2/io/trino/trino-jdbc/) |
| OceanBase | `com.oceanbase` | `oceanbase-client` | [Download](https://repo1.maven.org/maven2/com/oceanbase/oceanbase-client/) |
| Elasticsearch | `org.elasticsearch.plugin` | `x-pack-sql-jdbc` | [Download](https://www.elastic.co/downloads/jdbc-client) |

JDBC connector and drivers are not part of Flink's binary distribution. See how to link with them for cluster execution [here]({{< ref "docs/dev/configuration/overview" >}}).

Expand Down Expand Up @@ -656,7 +656,7 @@ SELECT * FROM `custom_schema.test_table2`;

Data Type Mapping
----------------
Flink supports connect to several databases which uses dialect like MySQL, Oracle, PostgreSQL, CrateDB, Derby, SQL Server, Db2 and OceanBase. The Derby dialect usually used for testing purpose. The field data type mappings from relational databases data types to Flink SQL data types are listed in the following table, the mapping table can help define JDBC table in Flink easily.
Flink supports connect to several databases which uses dialect like MySQL, Oracle, PostgreSQL, CrateDB, Derby, SQL Server, Db2, OceanBase, Elasticsearch. The Derby dialect usually used for testing purpose. The field data type mappings from relational databases data types to Flink SQL data types are listed in the following table, the mapping table can help define JDBC table in Flink easily.

<table class="table table-bordered">
<thead>
Expand All @@ -670,6 +670,7 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<th class="text-left"><a href="https://trino.io/docs/current/language/types.html">Trino type</a></th>
<th class="text-left"><a href="https://en.oceanbase.com/docs/common-oceanbase-database-10000000001106898">OceanBase MySQL mode type</a></th>
<th class="text-left"><a href="https://en.oceanbase.com/docs/common-oceanbase-database-10000000001107076">OceanBase Oracle mode type</a></th>
<th class="text-left"><a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-data-types.html">Elastic SQL type</a></th>
<th class="text-left"><a href="{{< ref "docs/dev/table/types" >}}">Flink SQL type</a></th>
</tr>
</thead>
Expand All @@ -684,6 +685,7 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<td><code>TINYINT</code></td>
<td><code>TINYINT</code></td>
<td></td>
<td><code>BYTE</code></td>
<td><code>TINYINT</code></td>
</tr>
<tr>
Expand All @@ -706,6 +708,7 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<code>SMALLINT</code><br>
<code>TINYINT UNSIGNED</code></td>
<td></td>
<td><code>SHORT</code></td>
<td><code>SMALLINT</code></td>
</tr>
<tr>
Expand All @@ -728,6 +731,7 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<code>MEDIUMINT</code><br>
<code>SMALLINT UNSIGNED</code></td>
<td></td>
<td><code>INTEGER</code></td>
<td><code>INT</code></td>
</tr>
<tr>
Expand All @@ -748,6 +752,9 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<code>BIGINT</code><br>
<code>INT UNSIGNED</code></td>
<td></td>
<td>
<code>LONG</code><br>
<code>UNSIGNED_LONG</code></td>
<td><code>BIGINT</code></td>
</tr>
<tr>
Expand All @@ -760,6 +767,7 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<td></td>
<td><code>BIGINT UNSIGNED</code></td>
<td></td>
<td></td>
<td><code>DECIMAL(20, 0)</code></td>
</tr>
<tr>
Expand All @@ -778,6 +786,9 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<td><code>FLOAT</code></td>
<td>
<code>BINARY_FLOAT</code></td>
<td>
<code>FLOAT</code><br>
<code>HALF_FLOAT</code></td>
<td><code>FLOAT</code></td>
</tr>
<tr>
Expand All @@ -796,6 +807,9 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<td><code>DOUBLE</code></td>
<td><code>DOUBLE</code></td>
<td><code>BINARY_DOUBLE</code></td>
<td>
<code>DOUBLE</code><br>
<code>SCALED_FLOAT</code></td>
<td><code>DOUBLE</code></td>
</tr>
<tr>
Expand Down Expand Up @@ -824,6 +838,7 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<td>
<code>FLOAT(s)</code><br>
<code>NUMBER(p, s)</code></td>
<td></td>
<td><code>DECIMAL(p, s)</code></td>
</tr>
<tr>
Expand All @@ -841,6 +856,7 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<code>TINYINT(1)</code></td>
<td></td>
<td><code>BOOLEAN</code></td>
<td><code>BOOLEAN</code></td>
</tr>
<tr>
<td><code>DATE</code></td>
Expand All @@ -852,6 +868,7 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<td><code>DATE</code></td>
<td><code>DATE</code></td>
<td><code>DATE</code></td>
<td></td>
<td><code>DATE</code></td>
</tr>
<tr>
Expand All @@ -864,6 +881,7 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<td><code>TIME_WITHOUT_TIME_ZONE</code></td>
<td><code>TIME [(p)]</code></td>
<td><code>DATE</code></td>
<td></td>
<td><code>TIME [(p)] [WITHOUT TIMEZONE]</code></td>
</tr>
<tr>
Expand All @@ -879,6 +897,7 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<td><code>TIMESTAMP_WITHOUT_TIME_ZONE</code></td>
<td><code>DATETIME [(p)]</code></td>
<td><code>TIMESTAMP [(p)] [WITHOUT TIMEZONE]</code></td>
<td><code>DATETIME</code></td>
<td><code>TIMESTAMP [(p)] [WITHOUT TIMEZONE]</code></td>
</tr>
<tr>
Expand Down Expand Up @@ -927,6 +946,11 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<code>NCHAR(n)</code><br>
<code>VARCHAR2(n)</code><br>
<code>CLOB</code></td>
<td>
<code>KEYWORD</code><br>
<code>IP</code><br>
<code>TEXT</code><br>
<code>VERSION</code></td>
<td><code>STRING</code></td>
</tr>
<tr>
Expand All @@ -952,6 +976,7 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<td>
<code>RAW(s)</code><br>
<code>BLOB</code></td>
<td><code>BINARY</code></td>
<td><code>BYTES</code></td>
</tr>
<tr>
Expand All @@ -964,9 +989,22 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<td><code>ARRAY</code></td>
<td></td>
<td></td>
<td></td>
<td><code>ARRAY</code></td>
</tr>
</tbody>
</table>

## License of JDBC driver for Elasticsearch

Flink's JDBC connector defines a Maven dependency on the "JDBC driver for Elasticsearch", which is licensed under
the Elastic License 2.0.

Flink itself neither reuses source code from the "JDBC driver for Elasticsearch"
nor packages binaries from the "JDBC driver for Elasticsearch".

Users that create and publish derivative work based on Flink's JDBC connector (thereby re-distributing
the "JDBC driver for Elasticsearch") must be aware that this may be subject to conditions declared in
the Elastic License 2.0.

{{< top >}}
34 changes: 34 additions & 0 deletions flink-connector-jdbc/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ under the License.
<oracle.version>21.8.0.0</oracle.version>
<trino.version>418</trino.version>
<byte-buddy.version>1.12.10</byte-buddy.version>
<elasticsearch.version>8.13.1</elasticsearch.version>
<surefire.module.config> <!-- required by
Db2ExactlyOnceSinkE2eTest --> --add-opens=java.base/java.util=ALL-UNNAMED <!--
SimpleJdbcConnectionProviderDriverClassConcurrentLoadingITCase--> --add-opens=java.base/java.lang=ALL-UNNAMED
Expand Down Expand Up @@ -114,6 +115,14 @@ under the License.
<scope>provided</scope>
</dependency>

<!-- Elasticsearch -->
<dependency>
<groupId>org.elasticsearch.plugin</groupId>
<artifactId>x-pack-sql-jdbc</artifactId>
<version>${elasticsearch.version}</version>
<scope>provided</scope>
</dependency>

<!-- Tests -->

<dependency>
Expand Down Expand Up @@ -250,6 +259,31 @@ under the License.
<scope>test</scope>
</dependency>

<!-- Elastic tests -->
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>elasticsearch</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-client</artifactId>
<version>${elasticsearch.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.13.4.2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.datatype</groupId>
<artifactId>jackson-datatype-jsr310</artifactId>
<version>2.13.4</version>
<scope>test</scope>
</dependency>

<!-- ArchUit test dependencies -->
<dependency>
<groupId>org.apache.flink</groupId>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.flink.connector.jdbc.databases.elasticsearch.dialect;

import org.apache.flink.annotation.Internal;
import org.apache.flink.connector.jdbc.converter.JdbcRowConverter;
import org.apache.flink.connector.jdbc.dialect.AbstractDialect;
import org.apache.flink.table.types.logical.LogicalTypeRoot;
import org.apache.flink.table.types.logical.RowType;

import java.util.EnumSet;
import java.util.Optional;
import java.util.Set;

/** JDBC dialect for Elastic. */
@Internal
public class ElasticsearchDialect extends AbstractDialect {

private static final long serialVersionUID = 1L;

// Define MAX/MIN precision of TIMESTAMP type according to Elastic docs:
// https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-data-types.html
private static final int MIN_TIMESTAMP_PRECISION = 0;
private static final int MAX_TIMESTAMP_PRECISION = 9;

@Override
public String dialectName() {
return "Elasticsearch";
}

@Override
public Optional<String> defaultDriverName() {
return Optional.of("org.elasticsearch.xpack.sql.jdbc.EsDriver");
}

@Override
public Set<LogicalTypeRoot> supportedTypes() {
// The list of types supported by Elastic SQL.
// https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-data-types.html
return EnumSet.of(
LogicalTypeRoot.BIGINT,
LogicalTypeRoot.BOOLEAN,
LogicalTypeRoot.DATE,
LogicalTypeRoot.DOUBLE,
LogicalTypeRoot.INTEGER,
LogicalTypeRoot.FLOAT,
LogicalTypeRoot.SMALLINT,
LogicalTypeRoot.TINYINT,
LogicalTypeRoot.TIMESTAMP_WITHOUT_TIME_ZONE,
LogicalTypeRoot.VARBINARY,
LogicalTypeRoot.VARCHAR);
}

@Override
public Optional<Range> timestampPrecisionRange() {
return Optional.of(Range.of(MIN_TIMESTAMP_PRECISION, MAX_TIMESTAMP_PRECISION));
}

@Override
public JdbcRowConverter getRowConverter(RowType rowType) {
return new ElasticsearchRowConverter(rowType);
}

@Override
public String getLimitClause(long limit) {
return "LIMIT " + limit;
}

@Override
public String quoteIdentifier(String identifier) {
return '"' + identifier + '"';
}

@Override
public Optional<String> getUpsertStatement(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if read is the only option for ES, this should be on documentation

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean docs/content/docs/connectors/table/jdbc.md? Which section is the most suitable one?

String tableName, String[] fieldNames, String[] uniqueKeyFields) {
throw new UnsupportedOperationException("Upsert is not supported.");
}

@Override
public String getInsertIntoStatement(String tableName, String[] fieldNames) {
throw new UnsupportedOperationException("Insert into is not supported.");
}

@Override
public String getUpdateStatement(
String tableName, String[] fieldNames, String[] conditionFields) {
throw new UnsupportedOperationException("Update is not supported.");
}

@Override
public String getDeleteStatement(String tableName, String[] conditionFields) {
throw new UnsupportedOperationException("Delete is not supported.");
}
}
Loading