Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/support aws s3 v3 #115

Merged
merged 7 commits into from
Jan 30, 2024
Merged

Feat/support aws s3 v3 #115

merged 7 commits into from
Jan 30, 2024

Conversation

shannonwells
Copy link
Collaborator

@shannonwells shannonwells commented Jan 23, 2024

Problem

Support AWS S3 V3 streams while retaining support for V2. V2 may be removed later.

Closes #32

with @wilwade , @pfrank13

Solution

Diverge when the stream looks like an AWS V3 stream and handle accordingly. I mostly used @pfrank13 's code workaround.

Change summary:

  • support V3 in reader.ts
  • add tests
  • use it.skip instead of commenting out test code.

Steps to Verify:

  1. Tests should all pass (can do red/green taking changes out if you like)
  2. Verify with a real AWS V3 stream that it works, assuming you have an S3 bucket with credentials. Example code:
import { S3Client} from '@aws-sdk/client-s3';
import { ParquetReader } from "@dsnp/parquetjs";


const main = async () => {
  const s3 = new S3Client({
    region: 'us-west-1',
    credentials: {
      accessKeyId: 'asdfkldfsjlfdsjkl',
      secretAccessKey: 'dsfjkfsdjklfsjkl',
    }
  });
  const Bucket = 'foo';
  const Key = 'bar.parquet';
  
  let reader = await ParquetReader.openS3(s3, {Key, Bucket});

  console.log(reader.envelopeReader?.metadata)
}

main().catch(console.error).finally(process.exit);

You should see output like:

{
  version: 1,
  schema: [
    {
      type: null,
      type_length: null,
      repetition_type: null,
      name: 'm',
      num_children: 4,
      converted_type: null,
      scale: null,
      precision: null,
      field_id: null,
      logicalType: null
    },
    {
      type: 1,
      type_length: null,
      repetition_type: 1,
      name: 'nation_key',
      num_children: null,
      converted_type: null,
      scale: null,
      precision: null,
      field_id: null,
      logicalType: null
    },
    {
      type: 6,
      type_length: null,
      repetition_type: 1,
      name: 'name',
      num_children: null,
      converted_type: null,
      scale: null,
      precision: null,
      field_id: null,
      logicalType: null
    },
    {
      type: 1,
      type_length: null,
      repetition_type: 1,
      name: 'region_key',
      num_children: null,
      converted_type: null,
      scale: null,
      precision: null,
      field_id: null,
      logicalType: null
    },
    {
      type: 6,
      type_length: null,
      repetition_type: 1,
      name: 'comment_col',
      num_children: null,
      converted_type: null,
      scale: null,
      precision: null,
      field_id: null,
      logicalType: null
    }
  ],
  num_rows: { buffer: <Buffer 00 00 00 00 00 00 00 19>, offset: 0 },
  row_groups: [
    {
      columns: [Array],
      total_byte_size: [Object],
      num_rows: [Object],
      sorting_columns: null,
      file_offset: null,
      total_compressed_size: null,
      ordinal: null
    }
  ],
  key_value_metadata: null,
  created_by: 'parquet-mr',
  column_orders: null,
  encryption_algorithm: null,
  footer_signing_key_metadata: null
}

@shannonwells shannonwells marked this pull request as ready for review January 23, 2024 01:32
@shannonwells shannonwells force-pushed the feat/support-aws-s3-v3 branch from 8875554 to 9e083f3 Compare January 23, 2024 01:33
@shannonwells shannonwells marked this pull request as draft January 23, 2024 20:32
@shannonwells shannonwells marked this pull request as ready for review January 30, 2024 01:41
Copy link
Member

@wilwade wilwade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢 it!

  • ✅ Reviewed
  • ✅ Build locally
  • ✅ Ran example using the browser build

@shannonwells shannonwells merged commit 117e5a5 into main Jan 30, 2024
1 check passed
@shannonwells shannonwells deleted the feat/support-aws-s3-v3 branch January 30, 2024 21:13

if (trailerBuf.slice(4).toString() != PARQUET_MAGIC) {
if (trailerBuf.subarray(4).toString() != PARQUET_MAGIC) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: slice has been deprecated in favor of subarray

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade to AWS SDK V3
2 participants