Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to AWS SDK V3 #32

Closed
pfrank13 opened this issue Nov 18, 2021 · 1 comment · Fixed by #115
Closed

Upgrade to AWS SDK V3 #32

pfrank13 opened this issue Nov 18, 2021 · 1 comment · Fixed by #115
Assignees
Labels
enhancement New feature or request

Comments

@pfrank13
Copy link

pfrank13 commented Nov 18, 2021

Steps to reproduce

  1. Using https://www.npmjs.com/package/@aws-sdk/client-s3, pass that AWS Client into https://github.com/LibertyDSNP/parquetjs/blob/main/lib/reader.js#L115
  2. Notice it errors out client.getObject() is undefined

Expected behaviour

  1. Ideally would work with the newer version of the AWS V3 SDK

Actual behaviour

Seems it used AWS V2 SDK

Any other comments?

Not so much a bug per se but rather a request to update this library to support AWS SDK V3

This is how we solved it in our application

// This is a hack file to support things that @dsnp/parquetjs doesn't support quite yet
import { GetObjectCommand, HeadObjectCommand, S3Client, GetObjectCommandInput } from "@aws-sdk/client-s3";
import { Readable } from "stream";
import { Blob } from "buffer";
const parquet = require("@dsnp/parquetjs");
const { ParquetReader, ParquetEnvelopeReader } = parquet;

export const openS3Reader = async (
  client: S3Client,
  params: GetObjectCommandInput,
  options?: any
): Promise<typeof ParquetReader> => {
  const fileStat = async () => {
    const headObjectResult = await client.send(new HeadObjectCommand(params));
    return headObjectResult.ContentLength;
  };

  const readFn = async (offset: number, length: number, file: string): Promise<Buffer> => {
    if (file) {
      return Promise.reject("external references are not supported");
    }
    const Range = `bytes=${offset}-${offset + length - 1}`;
    const response = await client.send(new GetObjectCommand({ ...{ Range }, ...params }));

    const body = response.Body;
    if (body) {
      return streamToBuffer(body);
    }
    return Buffer.of();
  };

  const closeFn = () => ({});

  const envelopeReader = new ParquetEnvelopeReader(readFn, closeFn, fileStat, options);

  return ParquetReader.openEnvelopeReader(envelopeReader, options);
};

async function streamToBuffer(body: any): Promise<Buffer> {
  const blob = body as Blob;
  if (blob.arrayBuffer !== undefined) {
    const arrayBuffer = await blob.arrayBuffer();
    const uint8Array: Uint8Array = new Uint8Array(arrayBuffer);
    return new Buffer(uint8Array);
  }

  //Assumed to be a Readable like object
  const readable = body as Readable;
  return await new Promise((resolve, reject) => {
    const chunks: Uint8Array[] = [];
    readable.on("data", (chunk) => chunks.push(chunk));
    readable.on("error", reject);
    readable.on("end", () => resolve(Buffer.concat(chunks)));
  });
}

Notes

  • It is OK to lose support for AWS V2 when doing this update

Points: 3

@wilwade wilwade moved this to 🧊 Icebox in DSNP and Frequency Project Nov 18, 2021
@wilwade wilwade moved this from 🧊 Icebox to 🪵 Backlog in DSNP and Frequency Project Nov 18, 2021
@wilwade wilwade added the enhancement New feature or request label Nov 18, 2021
@wilwade
Copy link
Member

wilwade commented Mar 4, 2022

Might be worth looking at ZJONSSON#64

@shannonwells shannonwells self-assigned this Apr 15, 2022
@wilwade wilwade changed the title Support AWS SDK V3 Upgrade to AWS SDK V3 Jun 13, 2022
@wilwade wilwade moved this from 🪵 Backlog to 🧊 Icebox in DSNP and Frequency Project Aug 11, 2022
@wilwade wilwade moved this from 🧊 Icebox to 🪵 Backlog in DSNP and Frequency Project Nov 14, 2022
@wilwade wilwade added the planning Discuss & point in planning meeting label Nov 14, 2022
@saraswatpuneet saraswatpuneet removed the planning Discuss & point in planning meeting label Nov 28, 2022
@shannonwells shannonwells self-assigned this Jan 10, 2024
shannonwells added a commit that referenced this issue Jan 30, 2024
Problem
=======
Support AWS S3 V3 streams while retaining support for V2. V2 may be removed later.
Closes #32
with @wilwade , @pfrank13 

Change summary:
---------------
* support V3 in reader.ts
* add tests
* use `it.skip` instead of commenting out test code.
---------

Co-authored-by: Wil Wade <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Status: 🪵 Backlog
Development

Successfully merging a pull request may close this issue.

4 participants