Skip to content

Commit e355fbd

Browse files
committed
Create CHANGELOG.md
1 parent e0a63fb commit e355fbd

File tree

1 file changed

+87
-0
lines changed

1 file changed

+87
-0
lines changed

CHANGELOG.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# Unreleased
2+
3+
First real release. It provides a Dialect that fixes several issues with
4+
how Spark's generic JDBC data source interacts with MonetDB, and also a
5+
MonetDB-specific data source that can upload data to MonetDB much faster
6+
than the JDBC data source can.
7+
8+
The dialect is automatically activated when monetdb-spark-0.2.0-fat.jar
9+
is on the classpath and the JDBC data source is invoked, for example
10+
`df.write.format("jdbc")` or `df.write.jdbc(...)` or
11+
`spark.read.format("jdbc")` or `spark.read.jdbc(...)`
12+
13+
The data source can be activated by using format `"org.monetdb.spark"`
14+
instead of format `"jdbc"`.
15+
16+
## Dialect
17+
18+
Note: the MonetDB JDBC driver and Spark JDBC Dialect are only
19+
activated when the option driver=org.monetdb.jdbc.MonetDriver
20+
is passed, for example
21+
`df.write.format("jdbc").option("driver", "org.monetdb.jdbc.MonetDriver")`.
22+
23+
Notable type mappings:
24+
25+
- Spark BooleanType is mapped to SQL BOOLEAN by default, not BIT.
26+
27+
- Spark ByteType is mapped to SQL SMALLINT by default, not BYTE. We
28+
cannot use TINYINT by default because the range of ByteType is
29+
-128..127 and the range of TINYINT is -127..127.
30+
31+
- Spark ShortType is mapped to SQL INTEGER by default. Again, this
32+
widening is necessary because the ranges of ShortType and SMALLINT
33+
do not overlap completely.
34+
35+
- Spark IntegerType is mapped to SQL BIGINT by default, again because of the
36+
ranges.
37+
38+
- Spark LongType is also mapped to SQL BIGINT even though the ranges do not
39+
entirely overlap. This is because the next larger SQL type, HUGEINT,
40+
is otherwise not really supported by Spark.
41+
42+
- Spark FloatType is mapped to SQL REAL by default, not SQL DOUBLE.
43+
44+
- Spark TimestampNTZType is mapped to SQL TIMESTAMP and
45+
Spark TimestampType is mapped to SQL TIMESTAMP WITH TIMEZONE.
46+
The timezone conversions have been fixed to work correctly.
47+
48+
The dialect does not provide a mapping from Spark DayTimeIntervalType
49+
and Spark YearMonthIntervalType to the corresponding SQL INTERVAL types.
50+
The JDBC data source does not seem to support this either.
51+
52+
- SQL INTERVAL types are mapped to the corresponding
53+
Spark DayTimeIntervalType and YearMonthIntervalType by default.
54+
55+
- SQL REAL is mapped to Spark FloatType by default, not DoubleType.
56+
57+
- SQL TINYINT is mapped to Spark ByteTYpe by default, not IntegerType.
58+
59+
- SQL SMALLINT is mapped to Spark ShortType by default, not IntegerType
60+
61+
- SQL TIMESTAMP is mapped to Spark TimestampNTZType by default, not TimestampType.
62+
63+
- SQL TIMESTAMP WITH TIMEZONE is mapped to Spark TimestampType by default.
64+
65+
## org.monetdb.spark Writer
66+
67+
The "org.monetdb.spark" data source uses COPY BINARY INTO to insert data
68+
into MonetDB rather than separate INSERT statements. This is about 20
69+
times faster.
70+
71+
In this release, the data source cannot yet create tables, only append
72+
data to existing tables. In other words, it support `.mode('append')`
73+
but not `.mode('overwrite')`.
74+
75+
The column types in the destination table do not have to exactly match
76+
the column types of the source dataframe. For example, any integer-like
77+
type can be written to any integer-like column. If the source range
78+
does not fit the destination range, a range check is performed.
79+
By default, any overflow aborts the upload but if option
80+
**allowoverflow** is set, overflowing values are replaced with NULLs.
81+
82+
The data source does not support reading from MonetDB.
83+
84+
85+
# v0.1.0 - 2025-07-16
86+
87+
This was a prototype release.

0 commit comments

Comments
 (0)