forked from apache/arrow
-
Notifications
You must be signed in to change notification settings - Fork 0
Encryption #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
andersonm-ibm
wants to merge
467
commits into
master
Choose a base branch
from
encryption
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Encryption #1
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename pull request title in the following format? or See also: |
ddc0dd5 to
483126a
Compare
andersonm-ibm
pushed a commit
that referenced
this pull request
Jun 20, 2021
Before change:
```
Direct leak of 65536 byte(s) in 1 object(s) allocated from:
#0 0x522f09 in
#1 0x7f28ae5826f4 in
apache#2 0x7f28ae57fa5d in
apache#3 0x7f28ae58cb0f in
apache#4 0x7f28ae58bda0 in
...
```
After change:
```
Direct leak of 65536 byte(s) in 1 object(s) allocated from:
#0 0x522f09 in posix_memalign (/build/cpp/debug/arrow-dataset-file-csv-test+0x522f09)
#1 0x7f28ae5826f4 in arrow::(anonymous namespace)::SystemAllocator::AllocateAligned(long, unsigned char**) /arrow/cpp/src/arrow/memory_pool.cc:213:24
apache#2 0x7f28ae57fa5d in arrow::BaseMemoryPoolImpl<arrow::(anonymous namespace)::SystemAllocator>::Allocate(long, unsigned char**) /arrow/cpp/src/arrow/memory_pool.cc:405:5
apache#3 0x7f28ae58cb0f in arrow::PoolBuffer::Reserve(long) /arrow/cpp/src/arrow/memory_pool.cc:717:9
apache#4 0x7f28ae58bda0 in arrow::PoolBuffer::Resize(long, bool) /arrow/cpp/src/arrow/memory_pool.cc:741:7
...
```
Closes apache#10498 from westonpace/feature/ARROW-13027--c-fix-asan-stack-traces-in-ci
Authored-by: Weston Pace <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
483126a to
6924e3f
Compare
6924e3f to
d2eb711
Compare
851fee9 to
3b8d8c3
Compare
1699d68 to
c767938
Compare
65bb905 to
72c34fe
Compare
andersonm-ibm
pushed a commit
that referenced
this pull request
Nov 28, 2021
Error log of Valgrind failure:
```
[----------] 3 tests from TestArrowReadDeltaEncoding
[ RUN ] TestArrowReadDeltaEncoding.DeltaBinaryPacked
[ OK ] TestArrowReadDeltaEncoding.DeltaBinaryPacked (812 ms)
[ RUN ] TestArrowReadDeltaEncoding.DeltaByteArray
==12587== Conditional jump or move depends on uninitialised value(s)
==12587== at 0x4F12C57: Advance (bit_stream_utils.h:426)
==12587== by 0x4F12C57: parquet::(anonymous namespace)::DeltaBitPackDecoder<parquet::PhysicalType<(parquet::Type::type)1> >::GetInternal(int*, int) (encoding.cc:2216)
==12587== by 0x4F13823: Decode (encoding.cc:2091)
==12587== by 0x4F13823: parquet::(anonymous namespace)::DeltaByteArrayDecoder::SetData(int, unsigned char const*, int) (encoding.cc:2360)
==12587== by 0x4E89EF5: parquet::(anonymous namespace)::ColumnReaderImplBase<parquet::PhysicalType<(parquet::Type::type)6> >::InitializeDataDecoder(parquet::DataPage const&, long) (column_reader.cc:797)
==12587== by 0x4E9AE63: ReadNewPage (column_reader.cc:614)
==12587== by 0x4E9AE63: HasNextInternal (column_reader.cc:576)
==12587== by 0x4E9AE63: parquet::internal::(anonymous namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)6> >::ReadRecords(long) (column_reader.cc:1228)
==12587== by 0x4DFB19F: parquet::arrow::(anonymous namespace)::LeafReader::LoadBatch(long) (reader.cc:467)
==12587== by 0x4DF513C: parquet::arrow::ColumnReaderImpl::NextBatch(long, std::shared_ptr<arrow::ChunkedArray>*) (reader.cc:108)
==12587== by 0x4DFB74D: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadColumn(int, std::vector<int, std::allocator<int> > const&, parquet::arrow::ColumnReader*, std::shared_ptr<arrow::ChunkedArray>*) (reader.cc:273)
==12587== by 0x4E11FDA: operator() (reader.cc:1180)
==12587== by 0x4E11FDA: arrow::Future<std::vector<std::shared_ptr<arrow::ChunkedArray>, std::allocator<arrow::Future> > > arrow::internal::OptionalParallelForAsync<parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, arrow::internal::Executor*)::{lambda(unsigned long, std::shared_ptr<parquet::arrow::ColumnReaderImpl>)#1}&, std::shared_ptr<parquet::arrow::ColumnReaderImpl>, std::shared_ptr<arrow::ChunkedArray> >(bool, std::vector<std::shared_ptr<parquet::arrow::ColumnReaderImpl>, std::allocator<arrow::Future<std::vector<std::shared_ptr<arrow::ChunkedArray>, std::allocator<arrow::Future> > > > >, parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, arrow::internal::Executor*)::{lambda(unsigned long, std::shared_ptr<parquet::arrow::ColumnReaderImpl>)#1}&, arrow::internal::Executor*) (parallel.h:95)
==12587== by 0x4E126A9: parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, arrow::internal::Executor*) (reader.cc:1198)
==12587== by 0x4E12F50: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadRowGroups(std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::shared_ptr<arrow::Table>*) (reader.cc:1160)
==12587== by 0x4DFA2BC: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadTable(std::vector<int, std::allocator<int> > const&, std::shared_ptr<arrow::Table>*) (reader.cc:198)
==12587== by 0x4DFA392: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadTable(std::shared_ptr<arrow::Table>*) (reader.cc:289)
==12587== by 0x1DCE62: parquet::arrow::TestArrowReadDeltaEncoding::ReadTableFromParquetFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<arrow::Table>*) (arrow_reader_writer_test.cc:4174)
==12587== by 0x2266D2: parquet::arrow::TestArrowReadDeltaEncoding_DeltaByteArray_Test::TestBody() (arrow_reader_writer_test.cc:4209)
==12587== by 0x4AD2C9B: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2607)
==12587== by 0x4AC9DD1: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2643)
==12587== by 0x4AA4C02: testing::Test::Run() (gtest.cc:2682)
==12587== by 0x4AA563A: testing::TestInfo::Run() (gtest.cc:2861)
==12587== by 0x4AA600F: testing::TestSuite::Run() (gtest.cc:3015)
==12587== by 0x4AB631B: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:5855)
==12587== by 0x4AD3CE7: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2607)
==12587== by 0x4ACB063: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2643)
==12587== by 0x4AB47B6: testing::UnitTest::Run() (gtest.cc:5438)
==12587== by 0x4218918: RUN_ALL_TESTS() (gtest.h:2490)
==12587== by 0x421895B: main (gtest_main.cc:52)
```
Closes apache#11725 from pitrou/ARROW-14704-parquet-valgrind
Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
a435814 to
194deb1
Compare
8d50e3a to
535f05d
Compare
9d53c1f to
46165ee
Compare
…s_monday when rounding to multiple of week This is to resolve [ARROW-15680](https://issues.apache.org/jira/browse/ARROW-15680). Closes apache#12515 from rok/ARROW-15680 Authored-by: Rok <[email protected]> Signed-off-by: David Li <[email protected]>
…transports This is intended to support ARROW-15706 and ARROW-15282. This splits out tests which do not touch the transport into their own test. Meanwhile, data plane methods are moved into a common library and can be instantiated for specific transports. This also does some cleanup to remove some redundant test helpers, and removes a DISABLED test that was effectively never being run. Closes apache#12499 from lidavidm/arrow-15707 Authored-by: David Li <[email protected]> Signed-off-by: Yibo Cai <[email protected]>
…ndas if it contains an ExtensionArray This PR tries to solve [ARROW-15291](https://issues.apache.org/jira/browse/ARROW-15291). Not sure about `if/else` in the `arrow_to_pandas.cc`. Maybe there is a way to redefine a field `shared_ptr` and optimize a bit? @pitrou what do you think? cc @jorisvandenbossche Closes apache#12505 from AlenkaF/ARROW-15291 Authored-by: Alenka Frim <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
…r Scalar Kernels Don't set bitmap buffer for scalar kernels when all values of inputs are valid Closes apache#12080 from 9prady9/ARROW-15118-Cast-kernel-always-creates-and-populates Authored-by: Pradeep Garigipati <[email protected]> Signed-off-by: David Li <[email protected]>
Closes apache#12504 from cyb70289/15763-csv-writer Authored-by: Yibo Cai <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…rsion which is no longer used Closes apache#12514 from westonpace/bugfix/ARROW-15784--single-file-parquet-read-regression Authored-by: Weston Pace <[email protected]> Signed-off-by: David Li <[email protected]>
…files. Exposes in PyArrow the high-level C++ API for Parquet encryption of files. Co-authored-by: Roee Shlomo <[email protected]>
…ory that doesn't return an actual KMS client.
Replace pycryptodomex with cryptography library Add test with wrong keys
…y and data key, add documentation and tests. Remove cryptography from python/requirements-test.txt, fix doc
Signed-off-by: roee88 <[email protected]> Signed-off-by: Maya Anderson <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
…uet is built for Python. Add back PME to minimal build. Apply review comments documentation and lint cleanup.
In Java the wrapped KEK is written without its length, whereas in C++ everything encrypted was written with its length prefix, including wrapped KEKs. This adds the option for Gcm/Ctr-Encrypt/Decrypt to write and read wrapped KEKs without the written length prefix.
Co-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
…ether with parquet encryption. Add a separate build variables for enabling PME - PYARROW_WITH_PARQUET_ENCRYPTION.
update documentation, add missing update to vault client, add a separate pytest mark for parquet_encryption.
andersonm-ibm
pushed a commit
that referenced
this pull request
Apr 3, 2022
TODOs: Convert cheat sheet to PDF and hide slide #1. Closes apache#12445 from pachadotdev/patch-4 Lead-authored-by: Stephanie Hazlitt <[email protected]> Co-authored-by: Pachá <[email protected]> Co-authored-by: Mauricio Vargas <[email protected]> Co-authored-by: Pachá <[email protected]> Signed-off-by: Jonathan Keane <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.