Skip to content

GH-46513: [Archery] Add external library support in Archery#46530

Open
Alex-PLACET wants to merge 16 commits intoapache:mainfrom
Alex-PLACET:archery_supports_external_libraries
Open

GH-46513: [Archery] Add external library support in Archery#46530
Alex-PLACET wants to merge 16 commits intoapache:mainfrom
Alex-PLACET:archery_supports_external_libraries

Conversation

@Alex-PLACET
Copy link

@Alex-PLACET Alex-PLACET commented May 21, 2025

Rationale for this change

Address #46513:
Add the possibility to use a library which is not part of the "official" implementations, in the integration tests

What changes are included in this PR?

Add new options in the CLI to be able to define the path to a library to test, and it's compatibility with the tests (IPC producer/consumer, c data array/schema impoter/exporter).
Add tester_external_library.py
Modify runner.py to use ExternalLibraryTester

Are these changes tested?

Yes localy and on the Sparrow CI:
https://github.com/man-group/sparrow/pull/426/checks#step:9:1
https://github.com/man-group/sparrow/pull/426/checks#step:9:7051

Are there any user-facing changes?

Yes, new options in the CLI and environment variable supported.
there is no breaking change.

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@Alex-PLACET Alex-PLACET changed the title Add external library support in Archery [GH-46513]:Add external library support in Archery May 21, 2025
@github-actions github-actions bot added the awaiting review Awaiting review label May 21, 2025
@Alex-PLACET Alex-PLACET changed the title [GH-46513]:Add external library support in Archery GH-46513: [Archery] Add external library support in Archery May 21, 2025
@github-actions
Copy link

⚠️ GitHub issue #46513 has been automatically assigned in GitHub to PR creator.

@Alex-PLACET Alex-PLACET marked this pull request as ready for review May 27, 2025 13:14
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a document how to use this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@click.option('--external-library-IPC-producer', type=bool, default=False,
@click.option('--external-library-ipc-producer', type=bool, default=False,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@click.option('--external-library-IPC-consumer', type=bool, default=False,
@click.option('--external-library-ipc-consumer', type=bool, default=False,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this...?

Suggested change
enabled_implementations = True
if args[param]:
enabled_implementations = True

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally simplified the code to raise an error if no implementation is selected

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if enabled_formats == False:
if not enabled_formats:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally simplified the code to raise an error if no format is selected

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if enabled_implementations == False:
if not enabled_implementations:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to use bool for this, we may need to rename this something like have_enabled_implementation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels May 28, 2025
@Alex-PLACET Alex-PLACET requested a review from kou May 28, 2025 12:38
@kou
Copy link
Member

kou commented Jun 4, 2025

Could you add a document how to use this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have docs for the Integration tests in https://github.com/apache/arrow/blob/main/docs/source/format/Integration.rst, I suggest augmenting this document rather than the README.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you ensure you wrap long lines to max 90 ~chars? This makes the code more readable, especially in diff views.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Final annotation seems pedantic to me. Besides, any returns a boolean, so no annotation should be required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@pitrou
Copy link
Member

pitrou commented Jun 4, 2025

A more general comment. The approach in this PR is a bit inflexible, as any implementation-specific setting is exposed as a dedicated CLI option. This may proliferate CLI options as we need more settings to be exposed.

For example, some per-implementation skips are currently hardcoded in the data generation (example). Perhaps we would like an external implementation to tell which tests it needs to skip. That will complicate the CLI quite a bit.

Another possibility is to allow passing a custom YAML file that records all the implementation-specific settings. The CLI would only have one additional argument and the YAML would carry the rest of the information in a structured way.

But of course, Sparrow might remain the only user of this functionality, so it's also up to you. What do you think @Alex-PLACET ?

@Alex-PLACET Alex-PLACET force-pushed the archery_supports_external_libraries branch from 94279c2 to 06475b5 Compare July 9, 2025 08:00
@pitrou
Copy link
Member

pitrou commented Aug 25, 2025

@Alex-PLACET Would you like to comment on #46530 (comment) ? I think we'll need something more flexible anyway, because at some points some skips will be required.

@Alex-PLACET
Copy link
Author

Apologies for the delayed response.
You’re right, the skipping feature is indeed missing. This wasn’t an issue for Sparrow, as its primary goal was full compatibility.
The main purpose of these changes is to enable non-Apache developers to test their libraries in their own CI environments. We’re already using this branch in our CI to run Sparrow against the Apache Arrow C++ implementation.
Using a YAML file could be a good solution, but I don’t have the bandwidth to implement it at the moment.
If you want, we can close this pull request and create a new one for the YAML later.

@pitrou
Copy link
Member

pitrou commented Aug 25, 2025

If you want, we can close this pull request and create a new one for the YAML later.

What do you think @kou @raulcd ?

@kou
Copy link
Member

kou commented Aug 26, 2025

If we use the YAML approach, I want to replace the all existing testers with YAML configurations.
In the case, we may want to specify multiple YAML files. For example:

apache/arrow provides one YAML file for all known implementations:

archery integration ... --config /.../apache/arrow/dev/archery/data/builtins.yaml ...

apache/arrow provides one YAML file per known implementation:

archery integration ... --config=/.../apache/arrow/dev/archery/data/{cpp,go,js}.yaml ...

Each repository provides a YAML file for their implementation:

archery integration ... \
  --config=/.../apache/arrow/dev/archery/data/cpp.yaml
  --config=/.../apache/arrow-go/integration/archery.yaml
  --config=/.../apache/arrow-js/integration/archery.yaml
  ...

@pitrou
Copy link
Member

pitrou commented Aug 26, 2025

If we use the YAML approach, I want to replace the all existing testers with YAML configurations.

This sounds an order of magnitude more difficult than simply supporting YAML for third-party (external) implementations. For example, the Java tester uses JPype to load and access the JVM in-process from Python.

@Alex-PLACET Alex-PLACET force-pushed the archery_supports_external_libraries branch from 06475b5 to 749dc7c Compare November 4, 2025 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants