Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] The inclusion of high-version Spark classes in paimon-spark-common may lead to certain exceptions. #5037

Open
1 of 2 tasks
thomasg19930417 opened this issue Feb 8, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@thomasg19930417
Copy link

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

1.0.0

Compute Engine

spark3.3

Minimal reproduce step

execute desc tableName or show create table command

What doesn't meet your expectations?

scala> spark.sql("show create table tableName").show(false)
java.lang.NoClassDefFoundError: [Lorg/apache/spark/sql/connector/catalog/Column;
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils$.invoke(AuthZUtils.scala:63)
at org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils$.invokeAs(AuthZUtils.scala:77)
at org.apache.kyuubi.plugin.spark.authz.serde.TableExtractor$.getOwner(tableExtractors.scala:50)
at org.apache.kyuubi.plugin.spark.authz.serde.ResolvedTableTableExtractor.apply(tableExtractors.scala:103)
at org.apache.kyuubi.plugin.spark.authz.serde.ResolvedTableTableExtractor.apply(tableExtractors.scala:97)
at org.apache.kyuubi.plugin.spark.authz.serde.TableDesc.extract(Descriptor.scala:244)
at org.apache.kyuubi.plugin.spark.authz.PrivilegesBuilder$.getTablePriv$1(PrivilegesBuilder.scala:128)
at org.apache.kyuubi.plugin.spark.authz.PrivilegesBuilder$.$anonfun$buildCommand$7(PrivilegesBuilder.scala:174)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.apache.kyuubi.plugin.spark.authz.PrivilegesBuilder$.buildCommand(PrivilegesBuilder.scala:172)
at org.apache.kyuubi.plugin.spark.authz.PrivilegesBuilder$.build(PrivilegesBuilder.scala:224)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.checkPrivileges(RuleAuthorization.scala:50)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:36)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:33)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:91)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:126)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:122)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:118)
at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:136)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:154)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:151)
at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:204)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:249)
at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:218)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
at org.apache.spark.sql.Dataset.(Dataset.scala:220)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:622)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
... 47 elided
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.connector.catalog.Column
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 118 more

Anything else?

I'm using the Spark authentication plugin of Kyuubi here. I've found that when checking permissions, an exception occurs because the SparkTable class in Paimon carries high - version Spark classes. I compared the implementations of different versions. In versions 0.8 and earlier of Paimon, the SparkTable was a Java class. After version 0.9, it was rewritten in Scala. This has led to the SparkTable class importing classes that only exist in high - version Spark after compilation.

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@thomasg19930417 thomasg19930417 added the bug Something isn't working label Feb 8, 2025
@thomasg19930417
Copy link
Author

thomasg19930417 commented Feb 10, 2025

We should keep using Java for implementation here. This should be able to avoid the import issues caused by Scala compilation when relying on the high - version Spark. Or is it possible to directly specify different Spark versions to generate different common JARs?

@thomasg19930417
Copy link
Author

We should keep using Java for implementation here. This should be able to avoid the import issues caused by Scala compilation when relying on the high - version Spark. Or is it possible to directly specify different Spark versions to generate different common JARs?

Attempts to modify the POM to the corresponding Spark version resulted in compilation failures, as some features from a higher - version Spark were separately introduced.

@thomasg19930417
Copy link
Author

@JingsongLi Could you spare some time to take a look at this issue? The current problem is preventing the upgrade to Paimon 1.0 version.

@thomasg19930417
Copy link
Author

This problem was solved by adding an empty implementation of Column in paimon - spark3.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant