Initial resize scheduler #3556

naoyam · 2024-12-10T05:54:10Z

This is a very preliminary version of a new scheduler mainly targeted for RoPE. I will incrementally extend this scheduler to be more flexible and performant, but for now it only handles a fusion that has pointwise ops and a single Resize-based tensor op such as SliceOp and PadOp. The scheduling strategy is currently pretty naive too and is manually demonstrated at #3549 and #3555, but the main point is that inputs of resize-based tensor ops like SliceOp or PadOp no longer need to have their inputs as fusion inputs.

The new scheduler is currently placed after the reduction scheduler and before the transpose and pointwise schedulers:

SchedulerType::ExprEval,
    SchedulerType::NoOp,
    SchedulerType::Matmul,
    SchedulerType::Reduction,
    SchedulerType::Resize, <-- New
    SchedulerType::Transpose,
    SchedulerType::PointWise,
    SchedulerType::InnerPersistent,
    SchedulerType::OuterPersistent,
    SchedulerType::InnerOuterPersistent};

https://github.com/NVIDIA/Fuser/pull/3556/files#diff-c0d261d44c61935fa2d5398f0ac52bd6ea077c6892fb5629c03a425a55fc32f2R64-R74

There are several small changes with some of the existing tests, mainly those on segmentation and alias support since this new scheduler may change how a fusion is segmented when resize is used. There's one thing I haven't addressed (#3556 (comment)), which I'm tracking with a separate issue.

naoyam · 2024-12-10T22:17:51Z

tests/cpp/test_resize.cpp

@@ -4096,64 +4108,85 @@ TEST_F(ResizeTest, PropagateSliceToInputs) {
  auto tv0 = makeConcreteTensor(shape);
  fusion.addInput(tv0);

-  auto tv1 = set(tv0);
+  // Dont't use set here as it gets taken by the no-op scheduler
+  auto tv1 = sin(tv0);


The changes from set to sin or cos are just to avoid the preseg transformation from kicking in.

naoyam · 2024-12-10T22:21:17Z

tests/cpp/test_resize.cpp

Nothing changed with the tests here (except replacing set with sin and one disabled test) but just extended some of the existing tests to use the resize scheduler as well. Not all patterns are supported yet, so they just call GTEST_SKIP for now.

naoyam · 2024-12-10T22:21:42Z

csrc/scheduler/tools/domain_map.h

This is just moved from pointwise_utils.h

naoyam · 2024-12-10T22:22:14Z

csrc/scheduler/tools/domain_map.cpp

Just moved from pointwise_utils to domain_map

naoyam · 2024-12-10T22:23:57Z

csrc/scheduler/resize.cpp

+
+namespace nvfuser {
+
+bool ResizeScheduler::canScheduleCompileTime(Fusion* fusion) {


In this initial version, I'm trying to make it very restrictive. Will have several follow-up PRs to schedule the whole RoPE module.

naoyam · 2024-12-10T22:25:43Z

csrc/scheduler/pointwise_utils.h

 #include <scheduler/utils.h>

 namespace nvfuser {
 namespace pointwise_utils {

-// DomainMap uses the ComputeAtMap to find a reference TensorView


This part is moved to scheduler/tools/domain_map.h

naoyam · 2024-12-10T22:26:15Z

csrc/scheduler/pointwise.cpp

@@ -29,37 +29,6 @@ namespace {
 // Unused at the moment, commenting for clang tidy
 constexpr int64_t kThreadX = 128;

-class DomainMap : public pointwise_utils::DomainMap {


This part is moved to pointwise_utils.h so that it can be also used from the resize scheduler

naoyam · 2024-12-10T22:26:29Z

csrc/scheduler/pointwise_utils.h

@@ -74,5 +30,44 @@ inline int64_t nRootDims(const TensorView* tv) {
  return tv_n_dims;
 }

+class DomainMap : public scheduler_tools::DomainMap {


This is moved from pointwise.cpp

naoyam · 2024-12-10T23:03:09Z

csrc/scheduler/pointwise.cpp

@@ -432,19 +403,11 @@ std::unique_ptr<PointwiseParams> getPointwiseHeuristics(
  return params;
 }

-// Return reference tensor view.


Just moved to pointwise_utils

naoyam · 2024-12-10T23:03:52Z

csrc/scheduler/pointwise_utils.h

+};
+
+// Return reference tensor view.
+inline TensorView* getReferenceTensor(Fusion* fusion) {


Moved from pointwise.cpp. Also shortened the name a bit (was getReferenceTensorView)

naoyam · 2024-12-11T04:00:01Z

!test

naoyam · 2024-12-11T04:03:14Z

tests/cpp/test_alias.cpp

@@ -520,6 +520,9 @@ TEST_F(AliasTest, AliasOutputBeforeNonAliasOutput) {
  testValidate(
      executor_cache.fusion(), out_tensors, {in_tensor}, __LINE__, __FILE__);

+  // TODO: Fix the alias support


This is broken for now. Need to understand how it actually works before this PR.

naoyam · 2024-12-11T04:03:40Z

tests/cpp/test_alias.cpp

@@ -959,34 +962,6 @@ TEST_F(AliasTest, SourceIsBothInputAndOutput) {
  EXPECT_EQ(in_tensor.data_ptr(), out_tensors[1].data_ptr());
 }

-TEST_F(AliasTest, SegmentBoundary) {


Probably not relevant as this isn't segmented anymore

naoyam · 2024-12-11T04:04:17Z

tests/cpp/test_gpu3.cpp

  const auto num_segments = kernel_runtime->fusionSegments()->groups().size();
-  NVF_CHECK(num_segments == 3, "Expect 3 segments, got: ", num_segments);
-  for (const auto& exec : kernel_runtime->executors()) {
+  EXPECT_EQ(num_segments, 2) << "Expect 2 segments, got: " << num_segments;


This is now just segmented to two kernels

naoyam · 2024-12-11T04:04:42Z

tests/cpp/test_gpu3.cpp

    if (!exec->isA<KernelExecutor>()) {
      continue;
    }
+    if (kernel_runtime->schedulerHeuristics()


The gmem requirement isn't relevant for the resize scheduler

… enable_id_model_for_resize

naoyam · 2024-12-13T22:41:54Z

!test

naoyam · 2024-12-13T23:52:11Z

!test

naoyam · 2024-12-15T16:14:41Z

!test

naoyam · 2024-12-15T17:04:38Z

!test

naoyam · 2024-12-15T20:49:31Z

!test

naoyam · 2024-12-16T15:02:27Z

!test

…bolicSizes) (#3578) Stacked on #3585 `StmtSort::getStmtsTo` may not grab all active iter domains if IDs are connected in an unconventional way. For example, we can set the loop domain of a tensor as a producer of its logical domain, but due to the nature of `IterVisitor`, such ID dependency patterns are not supported, meaning `StmtSort::getStmtsTo` would fail to grab all valid IDs and their exprs. I just recently noticed this issue while working on #3556, specifically the issue got exposed as an inconsistent replacement of extent vals. I've been experimenting such patterns of domains, but I hadn't seen this before, likely because I was using just static shape tensors for convenience. To fix the issue, I added a variation of `StmtSort::getStmtsTo`, which traverses a fusion as usual but stops at TensorView. For each TensorView, instead of using `IterVisitor`, it uses `TensorDomain::getAllStatements()`, which combines both `TensorDomain::allIDs()` and `TensorDomain::allExprs()`, and traverse the IDs and exprs in the returned order. It's a bit naive implementation, but I think this is good enough for now and also I don't have any other immediate idea to try. I changed `ValReplacementMutator` to use the new interface. That's the only use for now. --------- Co-authored-by: Jacob Hinkle <[email protected]>

Followup to #3514. Use `compareDomainWithReference` from the TensorDomain constructors too. This change is required for #3556.

naoyam · 2024-12-17T04:10:52Z

!test

naoyam · 2024-12-17T09:27:48Z

All tests passed as of c264867. I decided to make the scheduler opt-in for now since it's unlikely it gives any benefit yet. The NVFUSER_ENABLE option of resize_scheduler can be used to enable the scheduler.

Followup to #3556. Currently, the resize scheduler is only allowed with a single slice or pad. This PR allows for fusing multiple ops as long as they don't conflict. Please see the [comment](https://github.com/NVIDIA/Fuser/pull/3611/files#diff-b066c49d399243d3be36a44f1221490b9a2f50e41074feab836bc9bb6ee71180R25-R100) for `getNonExclusiveResizeInfo`. In this PR, if there's a conflict, the fusion is simply rejected. A followup PR will address this limitation by replicating computations.

naoyam force-pushed the resize_scheduler_initial_version branch 2 times, most recently from 5bde3d4 to 7e7db61 Compare December 10, 2024 20:05

naoyam commented Dec 10, 2024

View reviewed changes

csrc/scheduler/tools/domain_map.h Outdated

Copy link

Collaborator Author

naoyam Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just moved from pointwise_utils.h

naoyam commented Dec 10, 2024

View reviewed changes

csrc/scheduler/tools/domain_map.cpp Outdated

Copy link

Collaborator Author

naoyam Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just moved from pointwise_utils to domain_map

naoyam commented Dec 10, 2024

View reviewed changes

Base automatically changed from rotation_residual_support to main December 10, 2024 22:46

naoyam commented Dec 10, 2024

View reviewed changes

naoyam added 2 commits December 10, 2024 17:35

Always enable IdModel-based indexing when resize is used

11f5dce

Don't run the tests without IdModel

05ea88f

naoyam commented Dec 11, 2024

View reviewed changes

naoyam mentioned this pull request Dec 11, 2024

Alias support with the resize scheduler #3572

Closed

naoyam added 7 commits December 10, 2024 22:11

fix

0ad9fea

Allocation ordering fix

4f14988

Merge remote-tracking branch 'origin/enable_id_model_for_resize' into…

f3ce2d9

… enable_id_model_for_resize

rotation + residual

0d35147

wip

7934e63

move DomainMap to its own file

9e71bc5

Use the reference finder of pointwise scheduler

57600bd

naoyam force-pushed the resize_scheduler_initial_version branch from 4ad2ff7 to e8cb381 Compare December 11, 2024 09:22

naoyam changed the base branch from main to enable_id_model_for_resize December 11, 2024 09:22

merge fix

96ac0fa

naoyam added 2 commits December 15, 2024 08:13

python frontend fix

7e9413a

fix pattern match

40dd2c2

fix

8056cfa

naoyam added 2 commits December 15, 2024 12:48

test fix

b9415e1

Disable segmentation

aebfd51

fix

c264867

naoyam mentioned this pull request Dec 16, 2024

Python segmentation failure with resize #3594

Open

cleanup

8e82996

This was referenced Dec 16, 2024

Fix loop domain validation in TensorDomain constructors #3596

Merged

Shape inference failure due to the move pad preseg pass #3597

Closed

naoyam added 4 commits December 16, 2024 18:40

Disable resize scheduler by default

c4c1136

format

57f2279

format

0b8e9ba

Merge branch 'main' into resize_scheduler_initial_version

dc4f42d

naoyam added a commit that referenced this pull request Dec 17, 2024

Fix loop domain validation in TensorDomain constructors (#3596)

21e3617

Followup to #3514. Use `compareDomainWithReference` from the TensorDomain constructors too. This change is required for #3556.

Merge branch 'main' into resize_scheduler_initial_version

0422ce4

naoyam merged commit a880557 into main Dec 17, 2024
48 checks passed

naoyam deleted the resize_scheduler_initial_version branch December 17, 2024 09:27

naoyam mentioned this pull request Dec 19, 2024

Allow fusion of multiple exclusive resize ops #3611

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial resize scheduler #3556

Initial resize scheduler #3556

naoyam commented Dec 10, 2024 •

edited

Loading

naoyam Dec 10, 2024

naoyam Dec 10, 2024 •

edited

Loading

naoyam Dec 10, 2024

naoyam Dec 10, 2024

naoyam Dec 10, 2024

naoyam Dec 10, 2024

naoyam Dec 10, 2024

naoyam Dec 10, 2024

naoyam Dec 10, 2024

naoyam Dec 10, 2024

naoyam commented Dec 11, 2024

naoyam Dec 11, 2024

naoyam Dec 11, 2024

naoyam Dec 11, 2024

naoyam Dec 11, 2024

naoyam Dec 11, 2024

naoyam commented Dec 13, 2024

naoyam commented Dec 13, 2024

naoyam commented Dec 15, 2024

naoyam commented Dec 15, 2024

naoyam commented Dec 15, 2024

naoyam commented Dec 16, 2024

naoyam commented Dec 17, 2024

naoyam commented Dec 17, 2024


		namespace nvfuser {

		bool ResizeScheduler::canScheduleCompileTime(Fusion* fusion) {

Initial resize scheduler #3556

Initial resize scheduler #3556

Conversation

naoyam commented Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

naoyam Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

naoyam commented Dec 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

naoyam commented Dec 13, 2024

naoyam commented Dec 13, 2024

naoyam commented Dec 15, 2024

naoyam commented Dec 15, 2024

naoyam commented Dec 15, 2024

naoyam commented Dec 16, 2024

naoyam commented Dec 17, 2024

naoyam commented Dec 17, 2024

naoyam commented Dec 10, 2024 •

edited

Loading

naoyam Dec 10, 2024 •

edited

Loading