Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NOTICE and LICENSE in the spark-runtime jar #12160

Merged
merged 1 commit into from
Feb 6, 2025

Conversation

jbonofre
Copy link
Member

@jbonofre jbonofre commented Feb 2, 2025

This PR does:

  • Update Avro copyright
  • Remove paranamer (not found in the jar)
  • Update Parquet copyright
  • Update Thrift copyright
  • Update ORC copyright
  • Update Hive copyright
  • Remove airlift slice (not found in the jar)
  • Remove presto (not found in the jar)
  • Remove findbugs (jsr305) (not found in the jar)
  • Remove j2objc (not found in the jar)
  • Remove animal sniffer annotations (not found in the jar)
  • Remove carrot (not found in the jar)
  • Remove lucene (not found in the jar)
  • Remove yetus (not found in the jar)
  • Update Nessie copyright
  • Remove delta (not found in the jar)
  • Remove Apache projects from NOTICE

@github-actions github-actions bot added the spark label Feb 2, 2025
@jbonofre jbonofre force-pushed the notice-fix-spark-runtime branch from e366cdf to 8efaf1c Compare February 2, 2025 10:44
@@ -500,9 +363,52 @@ file:
This binary artifact includes Project Nessie with the following in its NOTICE
file:

| NOTICE applicable to Nessie source
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update Nessie NOTICE from the latest version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in #12145, I think we need Nessie to update license documentation before we can conclude that this is accurate. The current NOTICE for binary artifacts (that we use) still includes a line about GPL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated Nessie NOTICE corresponding to 0.102.5 release.

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it mostly looks great, just one comment on keeping the mention of Presto in the LICENSE. Thank you @jbonofre !

spark/v3.5/spark-runtime/LICENSE Outdated Show resolved Hide resolved
@rdblue
Copy link
Contributor

rdblue commented Feb 5, 2025

The description for this issue has:

Remove delta (not found in the jar)

I don't see the change anymore, but in case I'm missing it: this was included because source code is partially based on a Delta class, so to avoid any doubt we wanted to include it:

AssignmentAlignmentSupport is an independent development but UpdateExpressionsSupport in Delta was used as a reference.

@jbonofre
Copy link
Member Author

jbonofre commented Feb 5, 2025

The description for this issue has:

Remove delta (not found in the jar)

I don't see the change anymore, but in case I'm missing it: this was included because source code is partially based on a Delta class, so to avoid any doubt we wanted to include it:

AssignmentAlignmentSupport is an independent development but UpdateExpressionsSupport in Delta was used as a reference.

Yes, it was my bad. I re-add delta mention when I saw the code in the root LICENSE. So, we are clean now.

@jbonofre jbonofre force-pushed the notice-fix-spark-runtime branch from 163f5c2 to 497f6b8 Compare February 5, 2025 19:49
| Copyright 2015-2025 Dremio Corporation
|
| ---------------------------------------
| This project includes code from Apache Iceberg, with the following in its NOTICE file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be trimmed because it is from our own NOTICE. We are responsible for all the content about what Iceberg ships, so there is no need to duplicate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I trimmed the Nessie NOTICE by removing Iceberg and Netty (as it's already mentioned in the Iceberg Spark runtime NOTICE). I keep Polaris mention as it's not duplicate (ok, it's about build/gradle resources in Polaris, but for consistency, better to keep).

| | to the ASF by Snowflake Inc. (https://www.snowflake.com/) copyright 2024.
|
| ---------------------------------------
| This project includes code from Netty, with the following in its NOTICE file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's too bad that this was based on a source inclusion in Nessie. Nessie could probably check that none of this applies to ResolveConf that was copied to avoid more work for downstream bundlers.

Copy link
Member Author

@jbonofre jbonofre Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Nessie is using a similar approach as in Iceberg: as discussed, Iceberg is using LICENSE and NOTICE from source in jar files by default (only bundle and runtime jars have specific LICENSE and NOTICE.
For instance, Iceberg mentions ScriptRunner which is used for iceberg-hive-metastore test, but mention in all Iceberg jars. Or AssignmentAlignmentSupport is used only in iceberg-spark-extensions but mentioned in all iceberg jars.
So, I'm not very confortable to ask Nessie to change that (even if it's what I asked but the Nessie team says they will plan this as it's more work) when Iceberg is doing the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that we're more selective with NOTICE content when code is copied. That's because we can more easily verify that sections of the original project's NOTICE do or do not apply to the specific section that was copied or used.

@jbonofre jbonofre force-pushed the notice-fix-spark-runtime branch from 497f6b8 to f574393 Compare February 6, 2025 07:01
@jbonofre jbonofre force-pushed the notice-fix-spark-runtime branch from f574393 to 126bf34 Compare February 6, 2025 07:03
@Fokko
Copy link
Contributor

Fokko commented Feb 6, 2025

I'm seeing

This binary artifact contains code from Daniel Lemire's JavaFastPFOR project.

Copyright: 2013 Daniel Lemire
Home page: https://github.com/lemire/JavaFastPFOR
License: Apache License Version 2.0 http://www.apache.org/licenses/LICENSE-2.0

In the license, but I'm not able to find this in the dependency tree

@amogh-jahagirdar
Copy link
Contributor

I'm seeing

This binary artifact contains code from Daniel Lemire's JavaFastPFOR project.

Copyright: 2013 Daniel Lemire
Home page: https://github.com/lemire/JavaFastPFOR
License: Apache License Version 2.0 http://www.apache.org/licenses/LICENSE-2.0

In the license, but I'm not able to find this in the dependency tree

The Parquet License still references JavaFastPFOR https://github.com/apache/parquet-java/blob/parquet-1.15.x/LICENSE#L189 and I still see some references to the source in Parquet https://github.com/apache/parquet-java/blob/fb6f0be0323f5f52715b54b8c6602763d8d0128d/parquet-generator/src/main/java/org/apache/parquet/encoding/bitpacking/IntBasedBitPackingGenerator.java#L29 . I think we'd need to keep this, until we update Parquet (based on the code comment it looks like that code path I referenced is no longer really used upstream)?

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification @amogh-jahagirdar, I need to work on my grep skills :D

@Fokko Fokko added this to the Iceberg 1.8.0 milestone Feb 6, 2025
@rdblue rdblue merged commit c153741 into apache:main Feb 6, 2025
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants