Skip to content

Add support for Acceldata ODP-3.5.1 #12621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: branch-25.08
Choose a base branch
from

Conversation

prabhjyotsingh
Copy link

Add shim layer for Acceldata ODP 3.5.1

@prabhjyotsingh prabhjyotsingh requested a review from a team as a code owner May 1, 2025 18:54
@prabhjyotsingh
Copy link
Author

@shubhluck @kuhushukla can you please help review this PR?

@prabhjyotsingh prabhjyotsingh force-pushed the branch-25.06-351odp branch 2 times, most recently from 60c10f2 to fbf3bb0 Compare May 1, 2025 19:02
@@ -381,7 +381,7 @@ object HybridExecutionUtils extends PredicateHelper {
}
}

/**
/**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit remove whitespace change

@@ -33,7 +33,7 @@ import org.apache.spark.sql.rapids.execution.TrampolineUtil
import org.apache.spark.sql.types._

object HybridExecutionUtils extends PredicateHelper {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove whitespace change

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this PR removes it — but I’ll add it back to reduce the diff size.

}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Remove unrelated change

@@ -57,7 +58,7 @@ class RapidsCsvScanMeta(
parent: Option[RapidsMeta[_, _, _]],
rule: DataFromReplacementRule)
extends ScanMeta[CSVScan](cScan, conf, parent, rule) {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Remove whitespace change

}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Remove unrelated change

@@ -65,7 +66,7 @@ object SparkShimImpl extends Spark340PlusNonDBShims {
GpuToPrettyString(child)
}
}
}),
}),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Remove unrelated change

@@ -116,13 +117,13 @@ class RegularExpressionSuite extends SparkQueryCompareTestSuite {
}

testSparkResultsAreEqual("String regexp_extract regex 1", extractStrings, conf = conf) {
frame =>
frame =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Remove unrelated change

assume(isUnicodeEnabled())
frame.selectExpr("regexp_extract(strings, '^([a-z]*)([0-9]*)([a-z]*)$', 1)")
}

testSparkResultsAreEqual("String regexp_extract regex 2", extractStrings, conf = conf) {
frame =>
frame =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Remove unrelated change

@@ -44,7 +45,7 @@ class ToPrettyStringSuite extends GpuUnitTests {
val numRows = 100
val inputRows = GpuBatchUtilsSuite.createRows(schema, numRows)
val cpuOutput: Array[String] = inputRows.map {
input =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Remove unrelated change

@@ -39,6 +39,7 @@
{"spark": "350"}
{"spark": "350db143"}
{"spark": "351"}
{"spark": "351odp"}
Copy link
Collaborator

@kuhushukla kuhushukla May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is a better way to add odp versions that repeat across many of the files in this PR> I will defer that to other reviewers but would be certainly a nice thing to have

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was painful for me too.
Would love to know if there's a better way to handle this — or even better, if someone already has a script for it. Happy to learn!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically when we add a new shim, we know what existing shim it will mostly be based on https://github.com/NVIDIA/spark-rapids/blob/branch-25.06/docs/dev/shimplify.md#adding-a-new-shim . So most of this tagging is not by hand.

Copy link
Author

@prabhjyotsingh prabhjyotsingh May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @gerashegalov will definitely use it next time.

@@ -27,6 +27,11 @@ case class ClouderaShimVersion(major: Int, minor: Int, patch: Int, clouderaVersi
override def toString(): String = s"$major.$minor.$patch-cloudera-$clouderaVersion"
}

case class AcceldataShimVersion(major: Int, minor: Int, patch: Int, acceldataVersion: String)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may want to decompose accelDataVersion into int dimensions just like Spark versions for proper cmpSparkVersion even between different instances of AcceldataShimVersion

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure let me try and fix it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerashegalov Have made the required changes. Does this look better?

@prabhjyotsingh prabhjyotsingh force-pushed the branch-25.06-351odp branch from 498204e to f21c60f Compare May 5, 2025 19:00
@prabhjyotsingh
Copy link
Author

@gerashegalov @kuhushukla could you please help with a code review for this PR? Also, let me know if there's anything else needed from my side to move this forward.
Thanks!

@prabhjyotsingh
Copy link
Author

@gerashegalov @kuhushukla could you help me with what else is required here?

@gerashegalov
Copy link
Collaborator

Sorry for the delay, there is a discussion about accepting additional Spark vendor shims to reduce instances like checkNotRunningCDHorDatabricks

@sameerz sameerz added the feature request New feature or request label Jun 2, 2025
@sameerz
Copy link
Collaborator

sameerz commented Jun 2, 2025

Will need to be rebased to branch-25.08 as branch-25.06 is in burndown.

Signed-off-by: Prabhjyot Singh <[email protected]>

(cherry picked from commit 1a9c55a)
Signed-off-by: Prabhjyot Singh <[email protected]>

(cherry picked from commit 5c335c4)
Signed-off-by: Prabhjyot Singh <[email protected]>

(cherry picked from commit 70cb32a)
Signed-off-by: Prabhjyot Singh <[email protected]>

(cherry picked from commit f21c60f)
Signed-off-by: Prabhjyot Singh <[email protected]>

(cherry picked from commit 14798b1)
@prabhjyotsingh prabhjyotsingh changed the base branch from branch-25.06 to branch-25.08 June 3, 2025 16:12
@prabhjyotsingh
Copy link
Author

@sameerz sure I've rebased this with branch-25.08

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants