GH-46513: [Archery] Add external library support in Archery#46530
GH-46513: [Archery] Add external library support in Archery#46530Alex-PLACET wants to merge 16 commits intoapache:mainfrom
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
|
|
kou
left a comment
There was a problem hiding this comment.
Could you add a document how to use this?
dev/archery/archery/cli.py
Outdated
There was a problem hiding this comment.
| @click.option('--external-library-IPC-producer', type=bool, default=False, | |
| @click.option('--external-library-ipc-producer', type=bool, default=False, |
dev/archery/archery/cli.py
Outdated
There was a problem hiding this comment.
| @click.option('--external-library-IPC-consumer', type=bool, default=False, | |
| @click.option('--external-library-ipc-consumer', type=bool, default=False, |
dev/archery/archery/cli.py
Outdated
There was a problem hiding this comment.
Do we need this...?
| enabled_implementations = True | |
| if args[param]: | |
| enabled_implementations = True |
There was a problem hiding this comment.
I finally simplified the code to raise an error if no implementation is selected
dev/archery/archery/cli.py
Outdated
There was a problem hiding this comment.
| if enabled_formats == False: | |
| if not enabled_formats: |
There was a problem hiding this comment.
I finally simplified the code to raise an error if no format is selected
dev/archery/archery/cli.py
Outdated
There was a problem hiding this comment.
| if enabled_implementations == False: | |
| if not enabled_implementations: |
dev/archery/archery/cli.py
Outdated
There was a problem hiding this comment.
If we want to use bool for this, we may need to rename this something like have_enabled_implementation.
|
Could you add a document how to use this? |
dev/archery/README.md
Outdated
There was a problem hiding this comment.
We already have docs for the Integration tests in https://github.com/apache/arrow/blob/main/docs/source/format/Integration.rst, I suggest augmenting this document rather than the README.
dev/archery/archery/cli.py
Outdated
There was a problem hiding this comment.
Can you ensure you wrap long lines to max 90 ~chars? This makes the code more readable, especially in diff views.
dev/archery/archery/cli.py
Outdated
There was a problem hiding this comment.
The Final annotation seems pedantic to me. Besides, any returns a boolean, so no annotation should be required.
dev/archery/archery/cli.py
Outdated
|
A more general comment. The approach in this PR is a bit inflexible, as any implementation-specific setting is exposed as a dedicated CLI option. This may proliferate CLI options as we need more settings to be exposed. For example, some per-implementation skips are currently hardcoded in the data generation (example). Perhaps we would like an external implementation to tell which tests it needs to skip. That will complicate the CLI quite a bit. Another possibility is to allow passing a custom YAML file that records all the implementation-specific settings. The CLI would only have one additional argument and the YAML would carry the rest of the information in a structured way. But of course, Sparrow might remain the only user of this functionality, so it's also up to you. What do you think @Alex-PLACET ? |
94279c2 to
06475b5
Compare
|
@Alex-PLACET Would you like to comment on #46530 (comment) ? I think we'll need something more flexible anyway, because at some points some skips will be required. |
|
Apologies for the delayed response. |
|
If we use the YAML approach, I want to replace the all existing testers with YAML configurations. apache/arrow provides one YAML file for all known implementations: archery integration ... --config /.../apache/arrow/dev/archery/data/builtins.yaml ...apache/arrow provides one YAML file per known implementation: archery integration ... --config=/.../apache/arrow/dev/archery/data/{cpp,go,js}.yaml ...Each repository provides a YAML file for their implementation: archery integration ... \
--config=/.../apache/arrow/dev/archery/data/cpp.yaml
--config=/.../apache/arrow-go/integration/archery.yaml
--config=/.../apache/arrow-js/integration/archery.yaml
... |
This sounds an order of magnitude more difficult than simply supporting YAML for third-party (external) implementations. For example, the Java tester uses JPype to load and access the JVM in-process from Python. |
06475b5 to
749dc7c
Compare
Rationale for this change
Address #46513:
Add the possibility to use a library which is not part of the "official" implementations, in the integration tests
What changes are included in this PR?
Add new options in the CLI to be able to define the path to a library to test, and it's compatibility with the tests (IPC producer/consumer, c data array/schema impoter/exporter).
Add tester_external_library.py
Modify runner.py to use ExternalLibraryTester
Are these changes tested?
Yes localy and on the Sparrow CI:
https://github.com/man-group/sparrow/pull/426/checks#step:9:1
https://github.com/man-group/sparrow/pull/426/checks#step:9:7051
Are there any user-facing changes?
Yes, new options in the CLI and environment variable supported.
there is no breaking change.