Merge pull request #485 from NvTimLiu/merge-22.08-to-main

Merge 22.08 to main [skip ci]
NVIDIA · Aug 18, 2022 · b2a39ca · b2a39ca
2 parents aec090a + e0a6fed
commit b2a39ca
Show file tree

Hide file tree

Showing 26 changed files with 6,284 additions and 493 deletions.
diff --git a/.github/workflows/auto-merge.yml b/.github/workflows/auto-merge.yml
@@ -18,12 +18,12 @@ name: auto-merge HEAD to BASE
 on:
   pull_request_target:
     branches:
-      - branch-22.06
+      - branch-22.08
     types: [closed]
 
 env:
-  HEAD: branch-22.06
-  BASE: branch-22.08
+  HEAD: branch-22.08
+  BASE: branch-22.10
 
 jobs:
   auto-merge:

diff --git a/.gitmodules b/.gitmodules
@@ -1,4 +1,4 @@
 [submodule "thirdparty/cudf"]
 	path = thirdparty/cudf
 	url = https://github.com/rapidsai/cudf.git
-	branch = branch-22.06
+	branch = branch-22.08
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -74,10 +74,128 @@ to control aspects of the build:
 |`CUDF_USE_PER_THREAD_DEFAULT_STREAM`|CUDA per-thread default stream         |ON     |
 |`RMM_LOGGING_LEVEL`                 |RMM logging control                    |OFF    |
 |`USE_GDS`                           |Compile with GPU Direct Storage support|OFF    |
+|`BUILD_TESTS`                       |Compile tests                          |OFF    |
+|`BUILD_BENCHMARKS`                  |Compile benchmarks                     |OFF    |
 |`libcudf.build.configure`           |Force libcudf build to configure       |false  |
 |`libcudf.clean.skip`                |Whether to skip cleaning libcudf build |true   |
 |`submodule.check.skip`              |Whether to skip checking git submodules|false  |
 
+
+### Local testing of cross-repo contributions cudf, spark-rapids-jni, and spark-rapids
+
+When we work on a feature or a bug fix across repositories, it is beneficial to be able to
+run manual and integration tests end to end on the full stack from Apache Spark
+with spark-rapids plugin upfront before merging the PRs.
+
+So we are dealing with a subset of the following:
+
+Local PR branches for
+- rapidsai/cuDF, branch pr1
+- NVIDIA/spark-rapids-jni, branch pr2
+- NVIDIA/spark-rapids, branch pr3
+
+Our end goal is to build the rapids-4-spark dist jar in the pr3 branch under local repo path
+~/repos/NVIDIA/spark-rapids that includes changes from the pr2 branch in
+~/repos/NVIDIA/spark-rapids-jni and the pr1 branch in rapidsai/cuDF that we will test
+with Spark. There are two options for working on pr1.
+
+#### Option 1: Working on cuDF PR inside the the submodule in spark-rapids-jni
+To avoid retargeting the submodule to the local cuDF repo as below, we might find it easier
+to make changes locally under ~/repos/NVIDIA/spark-rapids-jni/thirdparty/cudf directly.
+
+In order to push pr1 to create a pull request, we need to add a remote to the submodule for the cuDF
+fork in our account
+
+```bash
+$ cd ~/repos/NVIDIA/spark-rapids-jni/thirdparty/cudf
+$ git remote add <user> [email protected]:<user>/cudf.git
+# make and commit changes
+$ git push <user>
+```
+
+#### Option 2: Working on cuDF PR in a conventional local cuDF fork
+Once we are done with our changes to the pr1 branch in
+~/repos/rapidsai/cuDF, we git commit changes locally.
+
+Then we cd to ~/repos/NVIDIA/spark-rapids-jni and point the cudf submodule temporarily to the pr1
+branch
+
+```bash
+$ git submodule set-url thirdparty/cudf ~/repos/rapidsai/cudf
+$ git submodule set-branch --branch pr1 thirdparty/cudf
+```
+
+Sync pr1 into our pr2 branch in ~/repos/NVIDIA/spark-rapids-jni
+```bash
+$ git submodule sync --recursive
+$ git submodule update --init --recursive --remote
+```
+
+#### Building final spark-rapids artifact with pr1, pr2, and pr3 changes
+Regardless what option we have used to make cuDF changes, we proceed with building
+spark-rapids-jni. The spark-rapids repo will consume spark-rapids-jni with pr1 and pr2 changes
+from the local Maven cache after we run `mvn install` via `build/build-in-docker`
+in ~/repos/NVIDIA/spark-rapids-jni.
+
+Make sure to stage thirdparty/cudf with `git add` to satifsfy build's submodule check.
+```bash
+$ git add thirdparty/cudf
+$ ./build/build-in-docker install ...
+```
+
+Now cd to ~/repos/NVIDIA/spark-rapids and build with one of the options from
+[spark-rapids instructions](https://github.com/NVIDIA/spark-rapids/blob/branch-22.08/CONTRIBUTING.md#building-from-source).
+
+```bash
+$ ./build/buildall
+```
+
+Since we rely on local Maven cache we need to pay extra attention to make sure that
+the final rapids-4-spark artifact includes the locally built dependencies as opposed to
+CI-built snapshot dependencies from the remote Maven repo. This may happen even if Maven
+is invoked with `--offline` or `--no-snapshot-updates` option due to IDE-Maven
+interactions in the background. To confirm that the artifact is correct we can either enable
+[INFO logging in Spark](https://github.com/NVIDIA/spark-rapids/blob/4c77f0db58d229b2e6cb75c196934fcc0ae3a485/sql-plugin/src/main/scala/com/nvidia/spark/rapids/Plugin.scala#L73-L83)
+or directly inspect the resulting jar for build info:
+```bash
+$ unzip -c dist/target/rapids-4-spark_2.12-22.08.0-SNAPSHOT-cuda11.jar *version-info.properties
+Archive:  dist/target/rapids-4-spark_2.12-22.08.0-SNAPSHOT-cuda11.jar
+  inflating: cudf-java-version-info.properties
+version=22.08.0-SNAPSHOT
+user=
+revision=62657ad6a296ea3547417504652e3b8836b020fb
+branch=testCUDF_pr1
+date=2022-07-19T21:48:15Z
+url=https://github.com/rapidsai/cudf.git
+
+  inflating: spark-rapids-jni-version-info.properties
+version=22.08.0-SNAPSHOT
+user=
+revision=70adcc86a513ad6665968021c669fbca7515a188
+branch=pr/user1/381
+date=2022-07-19T21:48:15Z
+[email protected]:NVIDIA/spark-rapids-jni.git
+
+  inflating: rapids4spark-version-info.properties
+version=22.08.0-SNAPSHOT
+cudf_version=22.08.0-SNAPSHOT
+user=user1
+revision=6453047ef479b5ec79384c5150c50af2f50f563e
+branch=aqeFinalPlanOnGPUDoc
+date=2022-07-19T21:51:52Z
+url=https://github.com/NVIDIA/spark-rapids
+```
+and verify that the branch names and the revisions in the console output
+correspond the local repos.
+
+When we are ready to move on, prior to switching to another spark-rapids-jni branch
+or submiting a PR to NVIDIA/spark-rapids-jni, we should undo the cudf submodule modifications.
+```
+$ cd ~/repos/NVIDIA/spark-rapids-jni
+$ git restore .gitmodules
+$ git restore --staged thirdparty/cudf
+```
+
 ### Building on Windows in WSL2
 Building on Windows can be done if your Windows build version supports
 [WSL2](https://docs.microsoft.com/en-us/windows/wsl/install). You can create a minimum
@@ -93,6 +211,24 @@ and build inside WSL2, e.g.
 > wsl -d Ubuntu ./build/build-in-docker clean install -DGPU_ACRCHS=NATIVE -Dtest="*,!CuFileTest"
 ```
 
+### Testing
+Java tests are in the `src/test` directory and c++ tests are in the `src/main/cpp/tests` directory.
+The c++ tests are built with the `-DBUILD_TESTS` command line option and will build into the
+`target/cmake-build/gtests/` directory. Due to building inside the docker container, it is possible
+that the host environment does not match the container well enough to run these executables, resulting
+in errors finding libraries. The script `build/run-in-docker` was created to help with this
+situation. A test can be run directly using this script or the script can be run without any
+arguments to get into an interactive shell inside the container.
+```build/run-in-docker target/cmake-build/gtests/ROW_CONVERSION```
+### Benchmarks
+Benchmarks exist for c++ benchmarks using NVBench and are in the `src/main/cpp/benchmarks` directory.
+To build these benchmarks requires the `-DBUILD_BENCHMARKS` build option. Once built, the benchmarks
+can be found in the `target/cmake-build/benchmarks/` directory. Due to building inside the docker
+container, it is possible that the host environment does not match the container well enough to
+run these executables, resulting in errors finding libraries. The script `build/run-in-docker`
+was created to help with this situation. A benchmark can be run directly using this script or the
+script can be run without any arguments to get into an interactive shell inside the container.
+```build/run-in-docker target/cmake-build/benchmarks/ROW_CONVERSION_BENCH```
 ## Code contributions
 
 ### Your first issue

diff --git a/build/build-in-docker b/build/build-in-docker
@@ -22,35 +22,17 @@ set -e
 
 # Base paths relative to this script's location
 SCRIPTDIR=$(cd $(dirname $0); pwd)
-REPODIR=$SCRIPTDIR/..
 
-CMAKE_GENERATOR=${CMAKE_GENERATOR:-Ninja}
-CUDA_VERSION=${CUDA_VERSION:-11.5.0}
-DOCKER_CMD=${DOCKER_CMD:-docker}
-LOCAL_CCACHE_DIR=${LOCAL_CCACHE_DIR:-"$HOME/.ccache"}
 LOCAL_MAVEN_REPO=${LOCAL_MAVEN_REPO:-"$HOME/.m2/repository"}
 CUDF_USE_PER_THREAD_DEFAULT_STREAM=${CUDF_USE_PER_THREAD_DEFAULT_STREAM:-ON}
 USE_GDS=${USE_GDS:-ON}
-
-SPARK_IMAGE_NAME="spark-rapids-jni-build:${CUDA_VERSION}-devel-centos7"
-WORKSPACE_DIR=/rapids
-WORKSPACE_REPODIR="$WORKSPACE_DIR/spark-rapids-jni"
-WORKSPACE_CCACHE_REPODIR="$WORKSPACE_DIR/.ccache"
-WORKSPACE_MAVEN_REPODIR="$WORKSPACE_DIR/.m2/repository"
+export CMAKE_GENERATOR=${CMAKE_GENERATOR:-"Ninja"}
 
 if (( $# == 0 )); then
   echo "Usage: $0 <Maven build arguments>"
   exit 1
 fi
 
-# ensure directories exist
-mkdir -p "$LOCAL_CCACHE_DIR" "$LOCAL_MAVEN_REPO"
-
-$DOCKER_CMD build -f $REPODIR/ci/Dockerfile \
-  --build-arg CUDA_VERSION=$CUDA_VERSION \
-  -t $SPARK_IMAGE_NAME \
-  $REPODIR/build
-
 _CUDF_CLEAN_SKIP=""
 # if ccache is enabled and libcudf.clean.skip not provided
 # by the user remove the cpp build directory
@@ -63,32 +45,8 @@ if [[ "$CCACHE_DISABLE" != "1" ]]; then
   fi
 fi
 
-if [[ "$DOCKER_CMD" == "docker" ]]; then
-  DOCKER_GPU_OPTS="--gpus all"
-fi
-
-$DOCKER_CMD run $DOCKER_GPU_OPTS -it -u $(id -u):$(id -g) --rm \
-  -v "/etc/group:/etc/group:ro" \
-  -v "/etc/passwd:/etc/passwd:ro" \
-  -v "/etc/shadow:/etc/shadow:ro" \
-  -v "/etc/sudoers.d:/etc/sudoers.d:ro" \
-  -v "$REPODIR:$WORKSPACE_REPODIR:rw" \
-  -v "$LOCAL_CCACHE_DIR:$WORKSPACE_CCACHE_REPODIR:rw" \
-  -v "$LOCAL_MAVEN_REPO:$WORKSPACE_MAVEN_REPODIR:rw" \
-  --workdir "$WORKSPACE_REPODIR" \
-  -e CCACHE_DISABLE \
-  -e CCACHE_DIR="$WORKSPACE_CCACHE_REPODIR" \
-  -e CMAKE_C_COMPILER_LAUNCHER="ccache" \
-  -e CMAKE_CXX_COMPILER_LAUNCHER="ccache" \
-  -e CMAKE_CUDA_COMPILER_LAUNCHER="ccache" \
-  -e CMAKE_CXX_LINKER_LAUNCHER="ccache" \
-  -e CMAKE_GENERATOR="$CMAKE_GENERATOR" \
-  -e CUDA_VISIBLE_DEVICES \
-  -e PARALLEL_LEVEL \
-  -e VERBOSE \
-  $SPARK_IMAGE_NAME \
-  scl enable devtoolset-9 "mvn \
-    -Dmaven.repo.local=$WORKSPACE_MAVEN_REPODIR \
+$SCRIPTDIR/run-in-docker "mvn \
+    -Dmaven.repo.local=$LOCAL_MAVEN_REPO \
     -DCUDF_USE_PER_THREAD_DEFAULT_STREAM=$CUDF_USE_PER_THREAD_DEFAULT_STREAM \
     -DUSE_GDS=$USE_GDS \
     $_CUDF_CLEAN_SKIP \

diff --git a/build/run-in-docker b/build/run-in-docker
@@ -0,0 +1,73 @@
+#!/bin/bash
+
+#
+# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Run a command in a Docker container with devtoolset
+
+set -e
+
+# Base paths relative to this script's location
+SCRIPTDIR=$(cd $(dirname $0); pwd)
+REPODIR=$SCRIPTDIR/..
+
+CUDA_VERSION=${CUDA_VERSION:-11.5.0}
+DOCKER_CMD=${DOCKER_CMD:-docker}
+LOCAL_CCACHE_DIR=${LOCAL_CCACHE_DIR:-"$HOME/.ccache"}
+LOCAL_MAVEN_REPO=${LOCAL_MAVEN_REPO:-"$HOME/.m2/repository"}
+
+SPARK_IMAGE_NAME="spark-rapids-jni-build:${CUDA_VERSION}-devel-centos7"
+
+# ensure directories exist
+mkdir -p "$LOCAL_CCACHE_DIR" "$LOCAL_MAVEN_REPO"
+
+$DOCKER_CMD build -f $REPODIR/ci/Dockerfile \
+  --build-arg CUDA_VERSION=$CUDA_VERSION \
+  -t $SPARK_IMAGE_NAME \
+  $REPODIR/build
+
+if [[ "$DOCKER_CMD" == "docker" ]]; then
+  DOCKER_GPU_OPTS="--gpus all"
+fi
+
+if (( $# == 0 )); then
+  # no arguments gets an interactive shell
+  DOCKER_ARGS="/bin/bash"
+else
+  DOCKER_ARGS="$*"
+fi
+
+$DOCKER_CMD run $DOCKER_GPU_OPTS -it -u $(id -u):$(id -g) --rm \
+  -v "/etc/group:/etc/group:ro" \
+  -v "/etc/passwd:/etc/passwd:ro" \
+  -v "/etc/shadow:/etc/shadow:ro" \
+  -v "/etc/sudoers.d:/etc/sudoers.d:ro" \
+  -v "$REPODIR:$REPODIR:rw" \
+  -v "$LOCAL_CCACHE_DIR:$LOCAL_CCACHE_DIR:rw" \
+  -v "$LOCAL_MAVEN_REPO:$LOCAL_MAVEN_REPO:rw" \
+  --workdir "$REPODIR" \
+  -e CCACHE_DISABLE \
+  -e CCACHE_DIR="$LOCAL_CCACHE_DIR" \
+  -e CMAKE_C_COMPILER_LAUNCHER="ccache" \
+  -e CMAKE_CXX_COMPILER_LAUNCHER="ccache" \
+  -e CMAKE_CUDA_COMPILER_LAUNCHER="ccache" \
+  -e CMAKE_CXX_LINKER_LAUNCHER="ccache" \
+  -e CMAKE_GENERATOR \
+  -e CUDA_VISIBLE_DEVICES \
+  -e PARALLEL_LEVEL \
+  -e VERBOSE \
+  $SPARK_IMAGE_NAME \
+  scl enable devtoolset-9 "$DOCKER_ARGS"
diff --git a/ci/nightly-build.sh b/ci/nightly-build.sh
@@ -26,4 +26,5 @@ mvn clean package ${MVN_MIRROR}  \
   -Psource-javadoc \
   -DCPP_PARALLEL_LEVEL=${PARALLEL_LEVEL} \
   -Dlibcudf.build.configure=true \
-  -DUSE_GDS=ON -Dtest=*,!CuFileTest
+  -DUSE_GDS=ON -Dtest=*,!CuFileTest,!CudaFatalTest \
+  -DBUILD_TESTS=ON
diff --git a/ci/premerge-build.sh b/ci/premerge-build.sh
@@ -25,4 +25,5 @@ PARALLEL_LEVEL=${PARALLEL_LEVEL:-4}
 mvn verify ${MVN_MIRROR} \
   -DCPP_PARALLEL_LEVEL=${PARALLEL_LEVEL} \
   -Dlibcudf.build.configure=true \
-  -DUSE_GDS=ON -Dtest=*,!CuFileTest
+  -DUSE_GDS=ON -Dtest=*,!CuFileTest,!CudaFatalTest \
+  -DBUILD_TESTS=ON
diff --git a/ci/submodule-sync.sh b/ci/submodule-sync.sh
@@ -68,7 +68,7 @@ set +e
 mvn verify ${MVN_MIRROR} \
   -DCPP_PARALLEL_LEVEL=${PARALLEL_LEVEL} \
   -Dlibcudf.build.configure=true \
-  -DUSE_GDS=ON -Dtest=*,!CuFileTest
+  -DUSE_GDS=ON -Dtest=*,!CuFileTest,!CudaFatalTest
 verify_status=$?
 set -e