remove dependency on cugraph-ops #99

tingyu66 · 2024-12-19T03:59:17Z

Address #81

copy-pr-bot · 2024-12-19T04:00:38Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

tingyu66 · 2024-12-19T04:07:06Z

/ok to test

bdice · 2024-12-19T15:28:11Z

dependencies.yaml

            packages:
-              - pytorch-cuda=12.4
-          - matrix: {cuda: "11.8"}
+            - pytorch-gpu>=2.3=*cuda120*


This is already using conda-forge, I think? pytorch-gpu is a conda-forge package, not a pytorch channel package. Also, the latest conda-forge builds are built with CUDA 12.6. CUDA 12.0 is no longer used to build.

For compatibility reasons we may want to stick to older builds of pytorch-gpu (built with cuda120) for now. We will hopefully be able to relax this in the future.

Yes, this PR switches to conda-forge::pytorch-gpu since pytorch channel will discontinue.

Also, the latest conda-forge builds are built with CUDA 12.6. CUDA 12.0 is no longer used to build.

For compatibility reasons we may want to stick to older builds of pytorch-gpu (built with cuda120) for now. We will hopefully be able to relax this in the future.

Oh, I had not noticed that the most recent build (_306) is only against 12.6. I agree with keeping 12.0 for better backward compatibility. However, the CUDA 11 build seems missing. Do we have details on their build matrix?

CUDA 11 builds were dropped recently. You may need an older version for CUDA 11 compatibility. I also saw this while working on rapidsai/cudf#17475. mamba search -c conda-forge "pytorch=*=cuda118*" indicates the latest version with CUDA 11 support is 2.5.1 build 303. The latest is 2.5.1 build 306.

For completeness, the latest CUDA 12.0 build was also 2.5.1 build 303.

Got it, thanks. It shouldn't be a dealbreaker unless another test component ends up requiring a newer version of torch on CUDA 11 down the line.

pytorch-gpu requires __cuda, and is not installable on systems without a CUDA driver. This makes it impossible to resolve the conda environment needed for devcontainers jobs in CI, which are CPU-only.

Note: many CUDA packages, including RAPIDS, are explicitly designed not to have __cuda as a run requirement, because it makes it impossible to install on a CPU node before using that environment on another system with a GPU.

It looks like if we just use pytorch instead of pytorch-gpu, we still get GPU builds:

CUDA 11 driver present:

CONDA_OVERRIDE_CUDA="11.8" conda create -n test --dry-run pytorch

shows

pytorch 2.5.1 cuda118_py313h40cdc2d_303 conda-forge

CUDA 12 driver present:

CONDA_OVERRIDE_CUDA="12.5" conda create -n test --dry-run pytorch

shows

pytorch 2.5.1 cuda126_py313hae2543e_306 conda-forge

No CUDA driver present:

CONDA_OVERRIDE_CUDA="" mamba create -n test --dry-run pytorch

shows

pytorch 2.5.1 cpu_mkl_py313_h90df46e_108 conda-forge

This should be sufficient. Let's try using just pytorch instead of pytorch-gpu with specific CUDA build selectors.

There are two benefits here, if my proposal above works.

devcontainers CI job would get CPU-only builds, which should still be fine for builds

We don't need to specify CUDA versions, so this dependency doesn't have to be "specific" to CUDA 11/12

I agree, let's try with "pytorch" instead of "pytorch-gpu".

That opens up a risk that there may be situations where the solver chooses a CPU-only version because of some conflict, but hopefully cugraph-pyg can detect that with torch.cuda.is_available() or similar and raise an informative error saying something like "if using conda, try 'conda install cugraph-pyg pytorch-gpu'".

We don't need to specify CUDA versions, so this dependency doesn't have to be "specific" to CUDA 11/12

I looked into this today... we shouldn't have needed to specify CUDA versions in build strings for pytorch-gpu anyway, as long as we're pinning the cuda-version package somewhere (for example, in the run: dependencies of cugraph).

Looks like pytorch-gpu is == pinned to a specific pytorch.

And the pytorch CUDA builds all have run: dependencies on cuda-version.

So here in cugraph-pyg, just having cuda-version as a run: dependency would be enough to ensure a compatible pytorch-gpu / pytorch is pulled.

@jameslamb These simplifications to drop build string info are only possible now with conda-forge, iirc. I believe more complexity was required when we used the pytorch channel, and we probably just carried that over when switching to conda-forge.

copy-pr-bot · 2024-12-22T21:45:24Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

tingyu66 · 2024-12-22T21:46:06Z

/ok to test

jameslamb · 2025-01-06T19:52:19Z

/ok to test

jameslamb

Hey @tingyu66 , I'd like to help keep this moving forward. Left some questions for your consideration.

dependencies.yaml

python/cugraph-pyg/cugraph_pyg/nn/conv/__init__.py

jameslamb · 2025-01-06T22:04:37Z

/ok to test

tingyu66 · 2025-01-06T22:10:27Z

/ok to test

tingyu66 · 2025-01-07T19:46:09Z

/ok to test

jameslamb · 2025-01-16T20:18:36Z

/ok to test

jameslamb

These changes look good to me as-is.

However, it looks like this would still leave pylibwholegraph with a hard runtime dependency on pylibcugraphops.

I pulled this branch today and looked around like this:

git grep -E -i 'cugraph.*ops'

cugraph-gnn/python/pylibwholegraph/pylibwholegraph/torch/cugraphops/gat_conv.py

Lines 18 to 19 in a9ab8b4

    
           from pylibcugraphops.pytorch.operators import mha_gat_n2n as GATConvAgg 
        
           from pylibcugraphops.pytorch import SampledCSC

cugraph-gnn/python/pylibwholegraph/pylibwholegraph/torch/cugraphops/sage_conv.py

Lines 19 to 20 in a9ab8b4

    
           from pylibcugraphops.pytorch.operators import agg_concat_n2n as SAGEConvAgg 
        
           from pylibcugraphops.pytorch import SampledCSC

Can that be addressed in this PR? Without removing that, we can't stop building and shipping pylibcugraphops packages.

And there are other places (like cugraph-pyg tests decorated @pytest.mark.cugraph_ops), that @bdice mentioned here #99 (review) ... can all those things also be removed?

tingyu66 · 2025-01-16T20:47:48Z

These changes look good to me as-is.

However, it looks like this would still leave pylibwholegraph with a hard runtime dependency on pylibcugraphops.

I pulled this branch today and looked around like this:
git grep -E -i 'cugraph.*ops'
cugraph-gnn/python/pylibwholegraph/pylibwholegraph/torch/cugraphops/gat_conv.py

Lines 18 to 19 in a9ab8b4

from pylibcugraphops.pytorch.operators import mha_gat_n2n as GATConvAgg

from pylibcugraphops.pytorch import SampledCSC

cugraph-gnn/python/pylibwholegraph/pylibwholegraph/torch/cugraphops/sage_conv.py

Lines 19 to 20 in a9ab8b4

from pylibcugraphops.pytorch.operators import agg_concat_n2n as SAGEConvAgg

from pylibcugraphops.pytorch import SampledCSC

Can that be addressed in this PR? Without removing that, we can't stop building and shipping pylibcugraphops packages.

And there are other places (like cugraph-pyg tests decorated @pytest.mark.cugraph_ops), that @bdice mentioned here #99 (review) ... can all those things also be removed?

I will remove the usage of the pytest decorator. I didn't realize that pylibwholegraph.torch.gnn_model also includes the cugraph-ops code path. I'll address it once I’m back at my keyboard today.

jameslamb · 2025-01-16T20:51:47Z

Thank you! Sorry for not noting it earlier. I recommend you look at everything matched by this:

git grep -E -i 'cugraph.*ops'

And see if it can be removed.

Also update this to latest branch-25.02, to pull in Alex's testing fixes from #82

bdice · 2025-01-16T23:26:16Z

CI is currently passing. This PR is already pretty large. If this is in a relatively good state, albeit partially incomplete, perhaps we could merge this and continue the remaining work in a follow-up PR.

tingyu66 · 2025-01-17T03:38:08Z

/ok to test

tingyu66 · 2025-01-17T03:56:29Z

CI is currently passing. This PR is already pretty large. If this is in a relatively good state, albeit partially incomplete, perhaps we could merge this and continue the remaining work in a follow-up PR.

@bdice While we're still waiting for the codeowner review, I went ahead and squeezed in the fix. Thanks!

-> git grep -E -i 'cugraph.*ops'

cugraph-dgl/cugraph_dgl/dataloading/utils/sampling_helpers.py:    # and int64 respectively. Since pylibcugraphops binding code doesn't
cugraph-pyg/cugraph_pyg/tests/sampler/test_sampler_utils.py:@pytest.mark.cugraph_ops
cugraph-pyg/cugraph_pyg/tests/sampler/test_sampler_utils.py:@pytest.mark.cugraph_ops
cugraph-pyg/cugraph_pyg/tests/sampler/test_sampler_utils_mg.py:@pytest.mark.cugraph_ops
cugraph-pyg/cugraph_pyg/tests/sampler/test_sampler_utils_mg.py:@pytest.mark.cugraph_ops
cugraph-pyg/pytest.ini:          cugraph_ops: Tests requiring cugraph-ops

@jameslamb Above are the only cugraph-ops occurrences. The first one is a helpful note. Regarding the pytest markers, I've lost track of why they're marked that way in the first place, as they do not call cugraph-ops internally. @alexbarghi-nv, are those uniform_neighbor_sample calls in these tests specific to cugraph-ops use cases?

alexbarghi-nv · 2025-01-17T04:18:40Z

Originally they used cugraph-ops at the C++ level but it's been taken out now. We can safely remove these markers.

tingyu66 · 2025-01-17T05:15:03Z

/ok to test

jameslamb

Thank you so much! I think all of my suggestions have been addressed and that we should merge this.

@alexbarghi-nv it looks like your approval would count for the other codeowners groups that need to approve here.

bdice

Build changes look good! Thanks for all this cleanup work!

alexbarghi-nv · 2025-01-17T19:45:43Z

/merge

Contributes to rapidsai/build-infra#155 (private issue) Stops triggering nightly builds and tests of `cugraph-ops`. ## Notes for Reviewers This should not be merged until the following are complete: * [x] rapidsai/cugraph-gnn#99

Contributes to rapidsai/build-infra#155 (private issue) Removes references to `cugraph-ops`. RAPIDS is dropping `cugraph-ops` completely (archiving https://github.com/rapidsai/cugraph-ops and no longer publishing packages) in v25.02. ## Notes for Reviewers Everything in the diff is based on this search: ```shell git grep -i ops ``` This should not be merged until the following are complete: * [x] rapidsai/cugraph-gnn#99 * [x] rapidsai/workflows#70

make cugraph-ops optional for cugraph-pyg

4f4c82e

tingyu66 requested review from a team as code owners December 19, 2024 03:59

tingyu66 requested a review from KyleFromNVIDIA December 19, 2024 03:59

tingyu66 marked this pull request as draft December 19, 2024 04:00

Merge branch 'branch-25.02' into rm-cugraph-ops

d3f975a

tingyu66 added breaking Introduces a breaking change improvement Improves an existing functionality labels Dec 19, 2024

bdice reviewed Dec 19, 2024

View reviewed changes

tingyu66 added 2 commits December 22, 2024 16:42

Merge branch 'branch-25.02' into rm-cugraph-ops

eddabef

format

bafd813

Merge branch 'branch-25.02' into rm-cugraph-ops

a111a09

jameslamb self-requested a review January 6, 2025 20:15

jameslamb reviewed Jan 6, 2025

View reviewed changes

dependencies.yaml Show resolved Hide resolved

python/cugraph-pyg/cugraph_pyg/nn/conv/__init__.py Outdated Show resolved Hide resolved

pre-commit

8e5a0f8

use just pytorch instead of pytorch-gpu

4992680

tingyu66 added 3 commits January 6, 2025 23:13

update -pyg examples

c48941a

skip importing convs in cugraph-dgl

1120cdb

Merge branch 'branch-25.02' into rm-cugraph-ops

0d285e0

tingyu66 marked this pull request as ready for review January 7, 2025 22:36

jameslamb mentioned this pull request Jan 8, 2025

remove references to cugraph-ops rapidsai/cugraph-docs#72

Merged

modify conda recipes

c4019df

jameslamb mentioned this pull request Jan 14, 2025

remove cugraph-ops rapidsai/devcontainers#437

Merged

2 tasks

Merge branch 'branch-25.02' into rm-cugraph-ops

7eb128a

jameslamb self-requested a review January 16, 2025 20:24

jameslamb reviewed Jan 16, 2025

View reviewed changes

tingyu66 added 2 commits January 16, 2025 22:22

clean up scripts and tests

a5bdd81

remove cugraph-ops models from pylibwholegraph.torch

51c30c2

tingyu66 requested review from a team as code owners January 17, 2025 03:32

Merge branch 'branch-25.02' into rm-cugraph-ops

7fe5168

remove pytest markers

f257f7b

tingyu66 changed the title ~~make cugraph-ops optional for cugraph-gnn packages~~ remove dependency on cugraph-ops Jan 17, 2025

jameslamb approved these changes Jan 17, 2025

View reviewed changes

bdice approved these changes Jan 17, 2025

View reviewed changes

bdice removed the request for review from KyleFromNVIDIA January 17, 2025 16:11

alexbarghi-nv approved these changes Jan 17, 2025

View reviewed changes

rapids-bot bot merged commit d38b832 into rapidsai:branch-25.02 Jan 17, 2025
82 checks passed

This was referenced Jan 17, 2025

Remove dependency on libcugraphops / pylibcugraphops #81

Open

Updates for 25.02 rapidsai/rapids-metadata#38

Merged

tingyu66 deleted the rm-cugraph-ops branch January 17, 2025 20:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove dependency on cugraph-ops #99

remove dependency on cugraph-ops #99

tingyu66 commented Dec 19, 2024

copy-pr-bot bot commented Dec 19, 2024

tingyu66 commented Dec 19, 2024

bdice Dec 19, 2024 •

edited

Loading

bdice Dec 19, 2024

tingyu66 Dec 20, 2024

bdice Dec 20, 2024

bdice Dec 20, 2024

tingyu66 Dec 20, 2024

bdice Jan 6, 2025 •

edited

Loading

bdice Jan 6, 2025

jameslamb Jan 7, 2025

bdice Jan 7, 2025

copy-pr-bot bot commented Dec 22, 2024

tingyu66 commented Dec 22, 2024

jameslamb commented Jan 6, 2025

jameslamb left a comment

jameslamb commented Jan 6, 2025

tingyu66 commented Jan 6, 2025

tingyu66 commented Jan 7, 2025

jameslamb commented Jan 16, 2025

jameslamb left a comment

tingyu66 commented Jan 16, 2025

jameslamb commented Jan 16, 2025 •

edited

Loading

bdice commented Jan 16, 2025

tingyu66 commented Jan 17, 2025

tingyu66 commented Jan 17, 2025

alexbarghi-nv commented Jan 17, 2025

tingyu66 commented Jan 17, 2025

jameslamb left a comment

bdice left a comment

alexbarghi-nv commented Jan 17, 2025

	from pylibcugraphops.pytorch.operators import mha_gat_n2n as GATConvAgg
	from pylibcugraphops.pytorch import SampledCSC

remove dependency on cugraph-ops #99

remove dependency on cugraph-ops #99

Conversation

tingyu66 commented Dec 19, 2024

copy-pr-bot bot commented Dec 19, 2024

tingyu66 commented Dec 19, 2024

bdice Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

CUDA 11 driver present:

CUDA 12 driver present:

No CUDA driver present:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

copy-pr-bot bot commented Dec 22, 2024

tingyu66 commented Dec 22, 2024

jameslamb commented Jan 6, 2025

jameslamb left a comment

Choose a reason for hiding this comment

jameslamb commented Jan 6, 2025

tingyu66 commented Jan 6, 2025

tingyu66 commented Jan 7, 2025

jameslamb commented Jan 16, 2025

jameslamb left a comment

Choose a reason for hiding this comment

tingyu66 commented Jan 16, 2025

jameslamb commented Jan 16, 2025 • edited Loading

bdice commented Jan 16, 2025

tingyu66 commented Jan 17, 2025

tingyu66 commented Jan 17, 2025

alexbarghi-nv commented Jan 17, 2025

tingyu66 commented Jan 17, 2025

jameslamb left a comment

Choose a reason for hiding this comment

bdice left a comment

Choose a reason for hiding this comment

alexbarghi-nv commented Jan 17, 2025

bdice Dec 19, 2024 •

edited

Loading

bdice Jan 6, 2025 •

edited

Loading

jameslamb commented Jan 16, 2025 •

edited

Loading