Skip to content

Commit

Permalink
Merge pull request #485 from NvTimLiu/merge-22.08-to-main
Browse files Browse the repository at this point in the history
Merge 22.08 to main [skip ci]
  • Loading branch information
pxLi authored Aug 18, 2022
2 parents aec090a + e0a6fed commit b2a39ca
Show file tree
Hide file tree
Showing 26 changed files with 6,284 additions and 493 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/auto-merge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,12 @@ name: auto-merge HEAD to BASE
on:
pull_request_target:
branches:
- branch-22.06
- branch-22.08
types: [closed]

env:
HEAD: branch-22.06
BASE: branch-22.08
HEAD: branch-22.08
BASE: branch-22.10

jobs:
auto-merge:
Expand Down
2 changes: 1 addition & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[submodule "thirdparty/cudf"]
path = thirdparty/cudf
url = https://github.com/rapidsai/cudf.git
branch = branch-22.06
branch = branch-22.08
136 changes: 136 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,10 +74,128 @@ to control aspects of the build:
|`CUDF_USE_PER_THREAD_DEFAULT_STREAM`|CUDA per-thread default stream |ON |
|`RMM_LOGGING_LEVEL` |RMM logging control |OFF |
|`USE_GDS` |Compile with GPU Direct Storage support|OFF |
|`BUILD_TESTS` |Compile tests |OFF |
|`BUILD_BENCHMARKS` |Compile benchmarks |OFF |
|`libcudf.build.configure` |Force libcudf build to configure |false |
|`libcudf.clean.skip` |Whether to skip cleaning libcudf build |true |
|`submodule.check.skip` |Whether to skip checking git submodules|false |


### Local testing of cross-repo contributions cudf, spark-rapids-jni, and spark-rapids

When we work on a feature or a bug fix across repositories, it is beneficial to be able to
run manual and integration tests end to end on the full stack from Apache Spark
with spark-rapids plugin upfront before merging the PRs.

So we are dealing with a subset of the following:

Local PR branches for
- rapidsai/cuDF, branch pr1
- NVIDIA/spark-rapids-jni, branch pr2
- NVIDIA/spark-rapids, branch pr3

Our end goal is to build the rapids-4-spark dist jar in the pr3 branch under local repo path
~/repos/NVIDIA/spark-rapids that includes changes from the pr2 branch in
~/repos/NVIDIA/spark-rapids-jni and the pr1 branch in rapidsai/cuDF that we will test
with Spark. There are two options for working on pr1.

#### Option 1: Working on cuDF PR inside the the submodule in spark-rapids-jni
To avoid retargeting the submodule to the local cuDF repo as below, we might find it easier
to make changes locally under ~/repos/NVIDIA/spark-rapids-jni/thirdparty/cudf directly.

In order to push pr1 to create a pull request, we need to add a remote to the submodule for the cuDF
fork in our account

```bash
$ cd ~/repos/NVIDIA/spark-rapids-jni/thirdparty/cudf
$ git remote add <user> [email protected]:<user>/cudf.git
# make and commit changes
$ git push <user>
```

#### Option 2: Working on cuDF PR in a conventional local cuDF fork
Once we are done with our changes to the pr1 branch in
~/repos/rapidsai/cuDF, we git commit changes locally.

Then we cd to ~/repos/NVIDIA/spark-rapids-jni and point the cudf submodule temporarily to the pr1
branch

```bash
$ git submodule set-url thirdparty/cudf ~/repos/rapidsai/cudf
$ git submodule set-branch --branch pr1 thirdparty/cudf
```

Sync pr1 into our pr2 branch in ~/repos/NVIDIA/spark-rapids-jni
```bash
$ git submodule sync --recursive
$ git submodule update --init --recursive --remote
```

#### Building final spark-rapids artifact with pr1, pr2, and pr3 changes
Regardless what option we have used to make cuDF changes, we proceed with building
spark-rapids-jni. The spark-rapids repo will consume spark-rapids-jni with pr1 and pr2 changes
from the local Maven cache after we run `mvn install` via `build/build-in-docker`
in ~/repos/NVIDIA/spark-rapids-jni.

Make sure to stage thirdparty/cudf with `git add` to satifsfy build's submodule check.
```bash
$ git add thirdparty/cudf
$ ./build/build-in-docker install ...
```

Now cd to ~/repos/NVIDIA/spark-rapids and build with one of the options from
[spark-rapids instructions](https://github.com/NVIDIA/spark-rapids/blob/branch-22.08/CONTRIBUTING.md#building-from-source).

```bash
$ ./build/buildall
```

Since we rely on local Maven cache we need to pay extra attention to make sure that
the final rapids-4-spark artifact includes the locally built dependencies as opposed to
CI-built snapshot dependencies from the remote Maven repo. This may happen even if Maven
is invoked with `--offline` or `--no-snapshot-updates` option due to IDE-Maven
interactions in the background. To confirm that the artifact is correct we can either enable
[INFO logging in Spark](https://github.com/NVIDIA/spark-rapids/blob/4c77f0db58d229b2e6cb75c196934fcc0ae3a485/sql-plugin/src/main/scala/com/nvidia/spark/rapids/Plugin.scala#L73-L83)
or directly inspect the resulting jar for build info:
```bash
$ unzip -c dist/target/rapids-4-spark_2.12-22.08.0-SNAPSHOT-cuda11.jar *version-info.properties
Archive: dist/target/rapids-4-spark_2.12-22.08.0-SNAPSHOT-cuda11.jar
inflating: cudf-java-version-info.properties
version=22.08.0-SNAPSHOT
user=
revision=62657ad6a296ea3547417504652e3b8836b020fb
branch=testCUDF_pr1
date=2022-07-19T21:48:15Z
url=https://github.com/rapidsai/cudf.git

inflating: spark-rapids-jni-version-info.properties
version=22.08.0-SNAPSHOT
user=
revision=70adcc86a513ad6665968021c669fbca7515a188
branch=pr/user1/381
date=2022-07-19T21:48:15Z
[email protected]:NVIDIA/spark-rapids-jni.git

inflating: rapids4spark-version-info.properties
version=22.08.0-SNAPSHOT
cudf_version=22.08.0-SNAPSHOT
user=user1
revision=6453047ef479b5ec79384c5150c50af2f50f563e
branch=aqeFinalPlanOnGPUDoc
date=2022-07-19T21:51:52Z
url=https://github.com/NVIDIA/spark-rapids
```
and verify that the branch names and the revisions in the console output
correspond the local repos.

When we are ready to move on, prior to switching to another spark-rapids-jni branch
or submiting a PR to NVIDIA/spark-rapids-jni, we should undo the cudf submodule modifications.
```
$ cd ~/repos/NVIDIA/spark-rapids-jni
$ git restore .gitmodules
$ git restore --staged thirdparty/cudf
```

### Building on Windows in WSL2
Building on Windows can be done if your Windows build version supports
[WSL2](https://docs.microsoft.com/en-us/windows/wsl/install). You can create a minimum
Expand All @@ -93,6 +211,24 @@ and build inside WSL2, e.g.
> wsl -d Ubuntu ./build/build-in-docker clean install -DGPU_ACRCHS=NATIVE -Dtest="*,!CuFileTest"
```

### Testing
Java tests are in the `src/test` directory and c++ tests are in the `src/main/cpp/tests` directory.
The c++ tests are built with the `-DBUILD_TESTS` command line option and will build into the
`target/cmake-build/gtests/` directory. Due to building inside the docker container, it is possible
that the host environment does not match the container well enough to run these executables, resulting
in errors finding libraries. The script `build/run-in-docker` was created to help with this
situation. A test can be run directly using this script or the script can be run without any
arguments to get into an interactive shell inside the container.
```build/run-in-docker target/cmake-build/gtests/ROW_CONVERSION```
### Benchmarks
Benchmarks exist for c++ benchmarks using NVBench and are in the `src/main/cpp/benchmarks` directory.
To build these benchmarks requires the `-DBUILD_BENCHMARKS` build option. Once built, the benchmarks
can be found in the `target/cmake-build/benchmarks/` directory. Due to building inside the docker
container, it is possible that the host environment does not match the container well enough to
run these executables, resulting in errors finding libraries. The script `build/run-in-docker`
was created to help with this situation. A benchmark can be run directly using this script or the
script can be run without any arguments to get into an interactive shell inside the container.
```build/run-in-docker target/cmake-build/benchmarks/ROW_CONVERSION_BENCH```
## Code contributions

### Your first issue
Expand Down
48 changes: 3 additions & 45 deletions build/build-in-docker
Original file line number Diff line number Diff line change
Expand Up @@ -22,35 +22,17 @@ set -e

# Base paths relative to this script's location
SCRIPTDIR=$(cd $(dirname $0); pwd)
REPODIR=$SCRIPTDIR/..

CMAKE_GENERATOR=${CMAKE_GENERATOR:-Ninja}
CUDA_VERSION=${CUDA_VERSION:-11.5.0}
DOCKER_CMD=${DOCKER_CMD:-docker}
LOCAL_CCACHE_DIR=${LOCAL_CCACHE_DIR:-"$HOME/.ccache"}
LOCAL_MAVEN_REPO=${LOCAL_MAVEN_REPO:-"$HOME/.m2/repository"}
CUDF_USE_PER_THREAD_DEFAULT_STREAM=${CUDF_USE_PER_THREAD_DEFAULT_STREAM:-ON}
USE_GDS=${USE_GDS:-ON}

SPARK_IMAGE_NAME="spark-rapids-jni-build:${CUDA_VERSION}-devel-centos7"
WORKSPACE_DIR=/rapids
WORKSPACE_REPODIR="$WORKSPACE_DIR/spark-rapids-jni"
WORKSPACE_CCACHE_REPODIR="$WORKSPACE_DIR/.ccache"
WORKSPACE_MAVEN_REPODIR="$WORKSPACE_DIR/.m2/repository"
export CMAKE_GENERATOR=${CMAKE_GENERATOR:-"Ninja"}

if (( $# == 0 )); then
echo "Usage: $0 <Maven build arguments>"
exit 1
fi

# ensure directories exist
mkdir -p "$LOCAL_CCACHE_DIR" "$LOCAL_MAVEN_REPO"

$DOCKER_CMD build -f $REPODIR/ci/Dockerfile \
--build-arg CUDA_VERSION=$CUDA_VERSION \
-t $SPARK_IMAGE_NAME \
$REPODIR/build

_CUDF_CLEAN_SKIP=""
# if ccache is enabled and libcudf.clean.skip not provided
# by the user remove the cpp build directory
Expand All @@ -63,32 +45,8 @@ if [[ "$CCACHE_DISABLE" != "1" ]]; then
fi
fi

if [[ "$DOCKER_CMD" == "docker" ]]; then
DOCKER_GPU_OPTS="--gpus all"
fi

$DOCKER_CMD run $DOCKER_GPU_OPTS -it -u $(id -u):$(id -g) --rm \
-v "/etc/group:/etc/group:ro" \
-v "/etc/passwd:/etc/passwd:ro" \
-v "/etc/shadow:/etc/shadow:ro" \
-v "/etc/sudoers.d:/etc/sudoers.d:ro" \
-v "$REPODIR:$WORKSPACE_REPODIR:rw" \
-v "$LOCAL_CCACHE_DIR:$WORKSPACE_CCACHE_REPODIR:rw" \
-v "$LOCAL_MAVEN_REPO:$WORKSPACE_MAVEN_REPODIR:rw" \
--workdir "$WORKSPACE_REPODIR" \
-e CCACHE_DISABLE \
-e CCACHE_DIR="$WORKSPACE_CCACHE_REPODIR" \
-e CMAKE_C_COMPILER_LAUNCHER="ccache" \
-e CMAKE_CXX_COMPILER_LAUNCHER="ccache" \
-e CMAKE_CUDA_COMPILER_LAUNCHER="ccache" \
-e CMAKE_CXX_LINKER_LAUNCHER="ccache" \
-e CMAKE_GENERATOR="$CMAKE_GENERATOR" \
-e CUDA_VISIBLE_DEVICES \
-e PARALLEL_LEVEL \
-e VERBOSE \
$SPARK_IMAGE_NAME \
scl enable devtoolset-9 "mvn \
-Dmaven.repo.local=$WORKSPACE_MAVEN_REPODIR \
$SCRIPTDIR/run-in-docker "mvn \
-Dmaven.repo.local=$LOCAL_MAVEN_REPO \
-DCUDF_USE_PER_THREAD_DEFAULT_STREAM=$CUDF_USE_PER_THREAD_DEFAULT_STREAM \
-DUSE_GDS=$USE_GDS \
$_CUDF_CLEAN_SKIP \
Expand Down
73 changes: 73 additions & 0 deletions build/run-in-docker
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
#!/bin/bash

#
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Run a command in a Docker container with devtoolset

set -e

# Base paths relative to this script's location
SCRIPTDIR=$(cd $(dirname $0); pwd)
REPODIR=$SCRIPTDIR/..

CUDA_VERSION=${CUDA_VERSION:-11.5.0}
DOCKER_CMD=${DOCKER_CMD:-docker}
LOCAL_CCACHE_DIR=${LOCAL_CCACHE_DIR:-"$HOME/.ccache"}
LOCAL_MAVEN_REPO=${LOCAL_MAVEN_REPO:-"$HOME/.m2/repository"}

SPARK_IMAGE_NAME="spark-rapids-jni-build:${CUDA_VERSION}-devel-centos7"

# ensure directories exist
mkdir -p "$LOCAL_CCACHE_DIR" "$LOCAL_MAVEN_REPO"

$DOCKER_CMD build -f $REPODIR/ci/Dockerfile \
--build-arg CUDA_VERSION=$CUDA_VERSION \
-t $SPARK_IMAGE_NAME \
$REPODIR/build

if [[ "$DOCKER_CMD" == "docker" ]]; then
DOCKER_GPU_OPTS="--gpus all"
fi

if (( $# == 0 )); then
# no arguments gets an interactive shell
DOCKER_ARGS="/bin/bash"
else
DOCKER_ARGS="$*"
fi

$DOCKER_CMD run $DOCKER_GPU_OPTS -it -u $(id -u):$(id -g) --rm \
-v "/etc/group:/etc/group:ro" \
-v "/etc/passwd:/etc/passwd:ro" \
-v "/etc/shadow:/etc/shadow:ro" \
-v "/etc/sudoers.d:/etc/sudoers.d:ro" \
-v "$REPODIR:$REPODIR:rw" \
-v "$LOCAL_CCACHE_DIR:$LOCAL_CCACHE_DIR:rw" \
-v "$LOCAL_MAVEN_REPO:$LOCAL_MAVEN_REPO:rw" \
--workdir "$REPODIR" \
-e CCACHE_DISABLE \
-e CCACHE_DIR="$LOCAL_CCACHE_DIR" \
-e CMAKE_C_COMPILER_LAUNCHER="ccache" \
-e CMAKE_CXX_COMPILER_LAUNCHER="ccache" \
-e CMAKE_CUDA_COMPILER_LAUNCHER="ccache" \
-e CMAKE_CXX_LINKER_LAUNCHER="ccache" \
-e CMAKE_GENERATOR \
-e CUDA_VISIBLE_DEVICES \
-e PARALLEL_LEVEL \
-e VERBOSE \
$SPARK_IMAGE_NAME \
scl enable devtoolset-9 "$DOCKER_ARGS"
3 changes: 2 additions & 1 deletion ci/nightly-build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,5 @@ mvn clean package ${MVN_MIRROR} \
-Psource-javadoc \
-DCPP_PARALLEL_LEVEL=${PARALLEL_LEVEL} \
-Dlibcudf.build.configure=true \
-DUSE_GDS=ON -Dtest=*,!CuFileTest
-DUSE_GDS=ON -Dtest=*,!CuFileTest,!CudaFatalTest \
-DBUILD_TESTS=ON
3 changes: 2 additions & 1 deletion ci/premerge-build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,5 @@ PARALLEL_LEVEL=${PARALLEL_LEVEL:-4}
mvn verify ${MVN_MIRROR} \
-DCPP_PARALLEL_LEVEL=${PARALLEL_LEVEL} \
-Dlibcudf.build.configure=true \
-DUSE_GDS=ON -Dtest=*,!CuFileTest
-DUSE_GDS=ON -Dtest=*,!CuFileTest,!CudaFatalTest \
-DBUILD_TESTS=ON
2 changes: 1 addition & 1 deletion ci/submodule-sync.sh
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ set +e
mvn verify ${MVN_MIRROR} \
-DCPP_PARALLEL_LEVEL=${PARALLEL_LEVEL} \
-Dlibcudf.build.configure=true \
-DUSE_GDS=ON -Dtest=*,!CuFileTest
-DUSE_GDS=ON -Dtest=*,!CuFileTest,!CudaFatalTest
verify_status=$?
set -e

Expand Down
Loading

0 comments on commit b2a39ca

Please sign in to comment.