Skip to content

Commit 28525cc

Browse files
razvanxeniapeNickLarsenNZTechassi
authored
chore(spark): update versions for sdp 26.3.0 (#1402)
* add Spark 4.1.1 * delete 3.5.6 and 4.0.1 * split Dockerfiles per version * update changelog * cleanup and successful build * update spark-connect-client versions * new spark-k8s/hbase-connectors image * chore(hive): Small cleanup after 4.2.0 version addition (#1397) * chore(hive): Small cleanup after 4.2.0 version addition * add to the template * add to the template * chore: Remove ZooKeeper 3.9.3 (#1401) * chore: Remove ZooKeeper 3.9.3 * chore: Update changelog * ci(hive): Use Ubicloud runners (#1399) * ci(spark): Use Ubicloud runners (#1400) * update changelog * attempt to fix dockerfile lint * another attempt to fix lints * readd 4.0.1 * remove duplicate changelog entry --------- Co-authored-by: Xenia <xenia.fischer@stackable.tech> Co-authored-by: Nick <10092581+NickLarsenNZ@users.noreply.github.com> Co-authored-by: Techassi <sascha.lautenschlaeger@stackable.tech>
1 parent 5d36cb9 commit 28525cc

File tree

13 files changed

+454
-271
lines changed

13 files changed

+454
-271
lines changed

CHANGELOG.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,9 @@ All notable changes to this project will be documented in this file.
1616
- opensearch-dashboards: Add `3.4.0` ([#1392]).
1717
- testing-tools: build testing tools subimages in workflow ([#1366]).
1818
- kafka: Add `4.1.1` ([#1395]).
19+
- spark: Add `4.1.1` ([#1402]).
20+
- spark-connect-client: Add `4.1.1` ([#1402]).
21+
- spark-k8s/hbase-connectors: new image extracted from spark dockerfile ([#1402]).
1922
- trino: Add `479` ([#1403]).
2023

2124
### Changed
@@ -29,13 +32,18 @@ All notable changes to this project will be documented in this file.
2932
- trino-cli: Bump to `479` ([#1403]).
3033
- ubi: Bumped ubi9 and ubi10 hashes ([#1386]).
3134
- vector: Bumped from 0.49.0 to 0.52.0 ([#1387]).
35+
- spark: Use one Dockerfile per major product version ([#1402]).
36+
Remove all HBase dependencies from the Spark 4 image.
37+
Pull logging dependencies with `mvn` instead of `curl` to remove manual maintenance in Nexus `packages`.
3238

3339
### Removed
3440

3541
- airflow: Remove 2.10.5 and 3.0.1 ([#1405]).
3642
- opensearch: Remove the `performance-analyzer` plugin from the OpenSearch image ([#1357]).
3743
- superset: Remove 4.0.2 and 4.1.2 ([#1394]).
3844
- kafka: Remove `3.7.2` and `4.1.0` ([#1395]).
45+
- spark: Remove `3.5.6` ([#1402]).
46+
- spark-connect-client: Remove `3.5.6` ([#1402]).
3947
- opa: Remove `1.4.2` ([#1396]).
4048
- zookeeper: Remove `3.9.3` ([#1401]).
4149
- trino: Remove `451` and `476` ([#1403]).
@@ -67,6 +75,7 @@ All notable changes to this project will be documented in this file.
6775
[#1395]: https://github.com/stackabletech/docker-images/pull/1395
6876
[#1396]: https://github.com/stackabletech/docker-images/pull/1396
6977
[#1401]: https://github.com/stackabletech/docker-images/pull/1401
78+
[#1402]: https://github.com/stackabletech/docker-images/pull/1402
7079
[#1403]: https://github.com/stackabletech/docker-images/pull/1403
7180
[#1405]: https://github.com/stackabletech/docker-images/pull/1405
7281

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,3 @@
1-
[versions."3.5.6".local-images]
2-
spark-k8s = "3.5.6"
3-
java-base = "17"
4-
5-
[versions."3.5.6".build-arguments]
6-
python-version = "3.11"
7-
81
[versions."3.5.7".local-images]
92
spark-k8s = "3.5.7"
103
java-base = "17"
@@ -14,7 +7,14 @@ python-version = "3.11"
147

158
[versions."4.0.1".local-images]
169
spark-k8s = "4.0.1"
17-
java-base = "17"
10+
java-base = "21"
1811

1912
[versions."4.0.1".build-arguments]
20-
python-version = "3.11"
13+
python-version = "3.12"
14+
15+
[versions."4.1.1".local-images]
16+
spark-k8s = "4.1.1"
17+
java-base = "21"
18+
19+
[versions."4.1.1".build-arguments]
20+
python-version = "3.12"
Lines changed: 20 additions & 138 deletions
Original file line numberDiff line numberDiff line change
@@ -33,106 +33,7 @@ EOF
3333

3434
# hbase-connectors-builder: Build the Spark HBase connector and copy
3535
# required JARs into /stackable/spark/jars
36-
FROM local-image/java-devel AS hbase-connectors-builder
37-
38-
ARG PRODUCT_VERSION
39-
ARG RELEASE_VERSION
40-
ARG HADOOP_HADOOP_VERSION
41-
# Reassign the arg to `HADOOP_VERSION` for better readability.
42-
ENV HADOOP_VERSION=${HADOOP_HADOOP_VERSION}
43-
ARG HBASE_VERSION
44-
ARG HBASE_CONNECTOR_VERSION
45-
ARG STACKABLE_USER_UID
46-
47-
WORKDIR /stackable
48-
49-
# Copy the pom.xml file from the patched Spark source code to read the
50-
# versions used by Spark. The pom.xml defines child modules which are
51-
# not required and not copied, therefore mvn must be called with the
52-
# parameter --non-recursive.
53-
COPY --chown=${STACKABLE_USER_UID}:0 --from=spark-source-builder \
54-
/stackable/src/spark-k8s/patchable-work/worktree/${PRODUCT_VERSION}/pom.xml \
55-
spark/
56-
57-
# Patch the hbase-connectors source code
58-
WORKDIR /stackable
59-
60-
COPY --chown=${STACKABLE_USER_UID}:0 spark-k8s/hbase-connectors/stackable/patches/patchable.toml /stackable/src/spark-k8s/hbase-connectors/stackable/patches/patchable.toml
61-
COPY --chown=${STACKABLE_USER_UID}:0 spark-k8s/hbase-connectors/stackable/patches/${HBASE_CONNECTOR_VERSION} /stackable/src/spark-k8s/hbase-connectors/stackable/patches/${HBASE_CONNECTOR_VERSION}
62-
63-
RUN <<EOF
64-
65-
# IMPORTANT: HBase connectors don't support Spark 4 yet, so we skip the build.
66-
# Watch this PR for updates: https://github.com/apache/hbase-connectors/pull/130
67-
if [[ "${PRODUCT_VERSION}" == 4* ]]; then
68-
# Create this empty directory so that following COPY layers succeed.
69-
mkdir -p /stackable/spark/jars
70-
# Create a dummy tarball to satisfy the build process for Spark 3.
71-
touch hbase-connector-${HBASE_CONNECTOR_VERSION}-stackable${RELEASE_VERSION}-src.tar.gz
72-
exit 0
73-
fi
74-
75-
cd "$(/stackable/patchable --images-repo-root=src checkout spark-k8s/hbase-connectors ${HBASE_CONNECTOR_VERSION})/spark"
76-
77-
NEW_VERSION="${HBASE_CONNECTOR_VERSION}-stackable${RELEASE_VERSION}"
78-
79-
mvn versions:set -DnewVersion=$NEW_VERSION
80-
81-
# Create snapshot of the source code including custom patches
82-
tar -czf /stackable/hbase-connector-${HBASE_CONNECTOR_VERSION}-stackable${RELEASE_VERSION}-src.tar.gz .
83-
84-
# Building the hbase-connectors with JDK 17 is not yet supported, see
85-
# https://github.com/apache/hbase-connectors/pull/132.
86-
# As there are no JDK profiles, access to the non-public elements must
87-
# be enabled with --add-opens, see https://openjdk.org/jeps/403 and
88-
# https://openjdk.org/jeps/261#Breaking-encapsulation.
89-
export JDK_JAVA_OPTIONS="\
90-
--add-opens java.base/java.lang=ALL-UNNAMED \
91-
--add-opens java.base/java.util=ALL-UNNAMED"
92-
93-
# Get the Scala version used by Spark
94-
SCALA_VERSION=$(grep "scala.version" /stackable/spark/pom.xml | head -n1 | awk -F '[<>]' '{print $3}')
95-
96-
# Get the Scala binary version used by Spark
97-
SCALA_BINARY_VERSION=$(grep "scala.binary.version" /stackable/spark/pom.xml | head -n1 | awk -F '[<>]' '{print $3}')
98-
99-
# Build the Spark HBase connector
100-
# Skip the tests because the MiniHBaseCluster does not get ready for
101-
# whatever reason:
102-
# Caused by: java.lang.RuntimeException: Master not active after 30000ms
103-
# at org.apache.hadoop.hbase.util.JVMClusterUtil.waitForEvent(JVMClusterUtil.java:221)
104-
# at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:177)
105-
# at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:407)
106-
# at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:250)
107-
mvn \
108-
--batch-mode \
109-
--no-transfer-progress \
110-
--define spark.version="${PRODUCT_VERSION}" \
111-
--define scala.version="${SCALA_VERSION}" \
112-
--define scala.binary.version="${SCALA_BINARY_VERSION}" \
113-
--define hadoop-three.version="${HADOOP_VERSION}" \
114-
--define hbase.version="${HBASE_VERSION}" \
115-
--define skipTests \
116-
--define maven.test.skip=true \
117-
clean package
118-
119-
mkdir -p /stackable/spark/jars
120-
ln -s "$(pwd)/hbase-spark/target/hbase-spark-${HBASE_CONNECTOR_VERSION}-stackable${RELEASE_VERSION}.jar" /stackable/spark/jars/hbase-spark-${HBASE_CONNECTOR_VERSION}-stackable${RELEASE_VERSION}.jar
121-
122-
cd /stackable/spark/jars
123-
124-
# Download log4j-slf4j-impl-x.x.x.jar containing the StaticLoggerBinder
125-
# which is required by the connector.
126-
# Spark contains only log4j-slf4j2-impl-x.x.x.jar but not
127-
# log4j-slf4j-impl-x.x.x.jar. It is okay to have both JARs in the
128-
# classpath as long as they have the same version.
129-
mvn --non-recursive --file /stackable/spark/pom.xml \
130-
dependency:copy \
131-
-Dartifact=org.apache.logging.log4j:log4j-slf4j-impl:'${log4j.version}' \
132-
-DoutputDirectory=./jars
133-
chmod g=u /stackable/hbase-connector-${HBASE_CONNECTOR_VERSION}-stackable${RELEASE_VERSION}-src.tar.gz .
134-
EOF
135-
36+
FROM local-image/spark-k8s/hbase-connectors AS hbase-connectors-builder
13637

13738
# spark-builder: Build Spark into /stackable/spark-${PRODUCT_VERSION}/dist,
13839
# download additional JARs and perform checks
@@ -173,26 +74,11 @@ RUN <<EOF
17374
MAVEN_BIN="/usr/bin/mvn"
17475
export MAVEN_OPTS="-Xss64m -Xmx2g -XX:ReservedCodeCacheSize=1g"
17576

176-
case "${PRODUCT_VERSION}" in
177-
4*)
178-
# The Spark 4 script has a --connect option which is not available in Spark 3.
179-
# This option is required to build Spark Connect.
180-
# Also this option breaks the Spark 3 build so we ensure it's only provided here.
181-
./dev/make-distribution.sh \
182-
--mvn "${MAVEN_BIN}" \
183-
--connect \
184-
-Dhadoop.version="${HADOOP_VERSION}-stackable${RELEASE_VERSION}" \
185-
-DskipTests \
186-
-P'hadoop-3' -Pkubernetes -Phive -Phive-thriftserver
187-
;;
188-
*)
189-
./dev/make-distribution.sh \
190-
--mvn "${MAVEN_BIN}" \
191-
-Dhadoop.version="${HADOOP_VERSION}-stackable${RELEASE_VERSION}" \
192-
-DskipTests \
193-
-P'hadoop-3' -Pkubernetes -Phive -Phive-thriftserver
194-
;;
195-
esac
77+
./dev/make-distribution.sh \
78+
--mvn "${MAVEN_BIN}" \
79+
-Dhadoop.version="${HADOOP_VERSION}-stackable${RELEASE_VERSION}" \
80+
-DskipTests \
81+
-P'hadoop-3' -Pkubernetes -Phive -Phive-thriftserver
19682

19783
sed -i "s/${NEW_VERSION}/${ORIGINAL_VERSION}/g" assembly/target/bom.json
19884
EOF
@@ -206,18 +92,9 @@ RUN <<EOF
20692
mkdir -p dist/connect
20793
cd dist/connect
20894

209-
case "${PRODUCT_VERSION}" in
210-
4*)
211-
cp "/stackable/spark-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}/sql/connect/server/target/spark-connect_${SCALA_BINARY_VERSION}-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}.jar" .
212-
cp "/stackable/spark-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}/sql/connect/common/target/spark-connect-common_${SCALA_BINARY_VERSION}-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}.jar" .
213-
cp "/stackable/spark-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}/sql/connect/client/jvm/target/spark-connect-client-jvm_${SCALA_BINARY_VERSION}-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}.jar" .
214-
;;
215-
*)
216-
cp "/stackable/spark-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}/connector/connect/server/target/spark-connect_${SCALA_BINARY_VERSION}-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}.jar" .
217-
cp "/stackable/spark-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}/connector/connect/common/target/spark-connect-common_${SCALA_BINARY_VERSION}-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}.jar" .
218-
cp "/stackable/spark-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}/connector/connect/client/jvm/target/spark-connect-client-jvm_${SCALA_BINARY_VERSION}-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}.jar" .
219-
;;
220-
esac
95+
cp "/stackable/spark-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}/connector/connect/server/target/spark-connect_${SCALA_BINARY_VERSION}-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}.jar" .
96+
cp "/stackable/spark-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}/connector/connect/common/target/spark-connect-common_${SCALA_BINARY_VERSION}-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}.jar" .
97+
cp "/stackable/spark-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}/connector/connect/client/jvm/target/spark-connect-client-jvm_${SCALA_BINARY_VERSION}-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}.jar" .
22198

22299
# This link is needed by the operator and is kept for backwards compatibility.
223100
# TODO: remove it at some time in the future.
@@ -272,12 +149,17 @@ WORKDIR /stackable/spark-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}/dist/ext
272149

273150
RUN <<EOF
274151
# Download jackson-dataformat-xml, stax2-api, and woodstox-core which are required for logging.
275-
curl --fail https://repo.stackable.tech/repository/packages/jackson-dataformat-xml/jackson-dataformat-xml-${JACKSON_DATAFORMAT_XML_VERSION}.jar \
276-
-o /stackable/spark-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}/dist/extra-jars/jackson-dataformat-xml-${JACKSON_DATAFORMAT_XML_VERSION}.jar
277-
curl --fail https://repo.stackable.tech/repository/packages/stax2-api/stax2-api-${STAX2_API_VERSION}.jar \
278-
-o /stackable/spark-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}/dist/extra-jars/stax2-api-${STAX2_API_VERSION}.jar
279-
curl --fail https://repo.stackable.tech/repository/packages/woodstox-core/woodstox-core-${WOODSTOX_CORE_VERSION}.jar \
280-
-o /stackable/spark-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}/dist/extra-jars/woodstox-core-${WOODSTOX_CORE_VERSION}.jar
152+
mvn dependency:get -Dartifact=com.fasterxml.jackson.dataformat:jackson-dataformat-xml:${JACKSON_DATAFORMAT_XML_VERSION}
153+
cp /root/.m2/repository/com/fasterxml/jackson/dataformat/jackson-dataformat-xml/${JACKSON_DATAFORMAT_XML_VERSION}/jackson-dataformat-xml-${JACKSON_DATAFORMAT_XML_VERSION}.jar \
154+
/stackable/spark-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}/dist/extra-jars/jackson-dataformat-xml-${JACKSON_DATAFORMAT_XML_VERSION}.jar
155+
156+
mvn dependency:get -Dartifact=mvn dependency:get -Dartifact=org.codehaus.woodstox:stax2-api:${STAX2_API_VERSION}
157+
cp /root/.m2/repository/org/codehaus/woodstox/stax2-api/${STAX2_API_VERSION}/stax2-api-${STAX2_API_VERSION}.jar \
158+
/stackable/spark-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}/dist/extra-jars/stax2-api-${STAX2_API_VERSION}.jar
159+
160+
mvn dependency:get -Dartifact=mvn dependency:get -Dartifact=com.fasterxml.woodstox:woodstox-core:${WOODSTOX_CORE_VERSION}
161+
cp /root/.m2/repository/com/fasterxml/woodstox/woodstox-core/${WOODSTOX_CORE_VERSION}/woodstox-core-${WOODSTOX_CORE_VERSION}.jar \
162+
/stackable/spark-${PRODUCT_VERSION}-stackable${RELEASE_VERSION}/dist/extra-jars/woodstox-core-${WOODSTOX_CORE_VERSION}.jar
281163

282164
# Get the correct `tini` binary for our architecture.
283165
curl --fail "https://repo.stackable.tech/repository/packages/tini/tini-${TINI_VERSION}-${TARGETARCH}" \

0 commit comments

Comments
 (0)