Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spectrum STORED AS PARQUET does not output expected DDL #194

Closed
1 of 3 tasks
mattppal opened this issue Mar 31, 2023 · 5 comments
Closed
1 of 3 tasks

Spectrum STORED AS PARQUET does not output expected DDL #194

mattppal opened this issue Mar 31, 2023 · 5 comments
Labels
bug Something isn't working redshift Stale triage

Comments

@mattppal
Copy link

Describe the bug

When defining external tables in Redshift Spectrum stored as parquet, the expected DDL is not returned by dbt-external-tables, rendering the external table unreadable.

Steps to reproduce

Config:

version: 2
sources:
  - name: spectrum
    schema: spectrum
    loader: S3
    loaded_at_field: loaded_at
    tables:
      - name: abc
        external:
          location: ...
          stored_as: PARQUET

Expected results

SHOW EXTERNAL TABLE spectrum.abc

Should yield

CREATE EXTERNAL TABLE spectrum.abc (
    ...
)
PARTITIONED BY ( .. )
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'xyz';

Since this is what is output when I run:

CREATE EXTERNAL TABLE spectrum.abc (
    ...
)
PARTITIONED BY ( .. )
STORED AS PARQUET
LOCATION 'xyz';

Actual results

The above command returns:

CREATE EXTERNAL TABLE spectrum.abc (
    ...
)
PARTITIONED BY ( .. )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 'xzy';

System information

packages:
  - package: dbt-labs/codegen
    version: 0.9.0
  - package: dbt-labs/redshift
    version: 0.8.0
  - package: dbt-labs/dbt_utils
    version: 1.0.0
  - package: dbt-labs/metrics
    version: 1.4.1
  - package: dbt-labs/dbt_external_tables
    version: 0.8.3

Which database are you using dbt with?

  • redshift
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

Core:
  - installed: 1.4.5
  - latest:    1.4.5 - Up to date!

Plugins:
  - redshift: 1.4.0 - Up to date!
  - postgres: 1.4.5 - Up to date!

The operating system you're using:
Python 3.9.0

Additional context

@mattppal mattppal added bug Something isn't working triage labels Mar 31, 2023
Copy link

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jan 22, 2024
@mattppal
Copy link
Author

Bump

@github-actions github-actions bot removed the Stale label Jan 23, 2024
@padbk
Copy link

padbk commented Jan 23, 2024

row_format: serde 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
file_format: parquet

The above works for me. No need for stored_as

Copy link

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jul 22, 2024
Copy link

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working redshift Stale triage
Projects
None yet
Development

No branches or pull requests

3 participants