-
Notifications
You must be signed in to change notification settings - Fork 87
Description
Description
As discussed in issue #804, the system-level libarrow.so provided in standard manylinux environments (or installed via system package managers) is often incomplete or lacks necessary components for our use case.
A more robust solution is to link graphar.so directly against the libarrow.so bundled within the pyarrow python package. This ensures we are using a full-featured Arrow library that matches the Python environment.
However, adopting this approach introduces several significant build and runtime challenges described below.
The dependency relationship is illustrated as follows:
graph TD
A[pyarrow bundled libarrow.so] --> B[pyarrow.whl]
A --> C[graphar.so <br> C++ Core]
C --> D[graphar.whl <br> Python Binding]
B -.-> D
style A fill:#f9f,stroke:#333,stroke-width:2px
style D fill:#bbf,stroke:#333,stroke-width:2px
Key Challenges
1. ABI Compatibility (The "Segfault" Risk)
C++ ABI (Application Binary Interface) is not guaranteed to be stable across different major versions of Apache Arrow.
- Risk: If
graphar.sois built against thelibarrow.sofrompyarrowv14.0.0, but the user updates topyarrowv15.0.0 at runtime, changes in class memory layouts or function signatures could cause immediate Segmentation Faults. - Difficulty: We need to determine a strategy to manage version constraints effectively, ensuring the build-time Arrow version is ABI-compatible with the runtime Arrow version.
2. Runtime Linkage (RPATH Resolution)
Unlike system libraries located in /usr/lib, the target libarrow.so resides deep within the python site-packages/pyarrow directory.
- Challenge: Standard linkers will not find this library by default.
graphar.somust be configured (likely via RPATH) to dynamically locatelibarrow.sorelative to its own location at runtime, without forcing users to manually manipulateLD_LIBRARY_PATH.
3. The "Two Arrows" Problem (ODR Violation)
If this linking is not handled correctly (e.g., if GraphAr accidentally links to a static Arrow or a different system Arrow), we risk having two different copies of Arrow code in the process memory.
- Consequence: This would violate the One Definition Rule (ODR). Passing objects (like
pyarrow.Table) between GraphAr and PyArrow would lead to undefined behavior, data corruption, or crashes.
Objective
We need to design a build strategy that successfully links against the pyarrow-bundled libraries while solving the RPATH and ABI compatibility issues.
Component(s)
Python, Developer Tools