Extract arrowarray and arrowschema creation#669
Extract arrowarray and arrowschema creation#669Alex-PLACET wants to merge 7 commits intoman-group:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #669 +/- ##
==========================================
+ Coverage 88.53% 88.61% +0.07%
==========================================
Files 115 117 +2
Lines 9545 9651 +106
==========================================
+ Hits 8451 8552 +101
- Misses 1094 1099 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR centralizes Arrow C Data Interface (ArrowArray / ArrowSchema) construction logic into a shared factory to reduce duplication across array implementations.
Changes:
- Introduces
include/sparrow/layout/arrow_schema_array_factory.hppandsrc/layout/arrow_schema_array_factory.cppto build commonArrowSchema/ArrowArrayinstances for multiple layouts. - Refactors several array types (primitive, list, map, union, timestamp, decimal, dictionary-encoded, variable-size binary/view, run-end encoded, null) to use the new factory helpers.
- Wires the new header/source into the build via
CMakeLists.txt.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
include/sparrow/layout/arrow_schema_array_factory.hpp |
New shared factory helpers for Arrow schema/array creation (contains several correctness/contract issues for empty children). |
src/layout/arrow_schema_array_factory.cpp |
Implements the non-template factory functions declared in the new header. |
include/sparrow/layout/primitive_array_impl.hpp |
Uses factory helpers for primitive Arrow schema/array creation. |
include/sparrow/list_array.hpp |
Migrates list array schema/array creation to factory helpers. |
include/sparrow/map_array.hpp |
Migrates map array schema/array creation to factory helpers and passes keys_sorted explicitly. |
include/sparrow/struct_array.hpp |
Migrates struct schema/array creation to factory helpers; changes sizing logic. |
include/sparrow/union_array.hpp |
Migrates union schema/array creation to factory helpers. |
include/sparrow/run_end_encoded_array.hpp |
Migrates run-end encoded schema/array creation to factory helpers. |
include/sparrow/null_array.hpp |
Migrates null schema/array creation to factory helpers. |
include/sparrow/timestamp_array.hpp |
Migrates timestamp schema/array creation to factory helpers. |
include/sparrow/decimal_array.hpp |
Migrates decimal schema/array creation to factory helpers. |
include/sparrow/dictionary_encoded_array.hpp |
Migrates dictionary-encoded schema/array creation to factory helpers. |
include/sparrow/variable_size_binary_array.hpp |
Migrates variable-size binary schema/array creation to factory helpers. |
include/sparrow/variable_size_binary_view_array.hpp |
Migrates binary-view schema/array creation to factory helpers and introduces create_proxy_impl. |
include/sparrow/fixed_width_binary_array.hpp |
Migrates fixed-width binary schema/array creation to factory helpers. |
CMakeLists.txt |
Adds the new header and source file to the build. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const std::size_t n = child_arrays.size(); | ||
| ArrowArray** ptrs = new ArrowArray*[n]; | ||
| for (std::size_t i = 0; i < n; ++i) | ||
| { | ||
| ptrs[i] = new ArrowArray(std::move(child_arrays[i])); | ||
| } | ||
| std::vector<buffer<uint8_t>> buffers; | ||
| buffers.reserve(1); | ||
| buffers.emplace_back(std::move(bitmap_buf)); | ||
| return make_arrow_array( | ||
| size, | ||
| null_count, | ||
| 0, | ||
| std::move(buffers), | ||
| ptrs, | ||
| repeat_view<bool>(true, n), | ||
| nullptr, |
There was a problem hiding this comment.
make_struct_arrow_array allocates ptrs = new ArrowArray*[n] even when n == 0 and passes it to make_arrow_array. make_arrow_array asserts (children_ownership.size() == 0) == (children == nullptr), so this will fail (and leak) for empty structs. Avoid allocating and pass nullptr for children when there are zero children.
f964c7c to
6f70f44
Compare
for more information, see https://pre-commit.ci
…/github.com/Alex-PLACET/sparrow into extract_arrowarray_and_arrowschema_creation
for more information, see https://pre-commit.ci
No description provided.