Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix dbt docs generate for athena execution engines 1 & 2. #37

Closed
wants to merge 4 commits into from

Conversation

haleemur
Copy link

@haleemur haleemur commented Dec 7, 2020

execution engine 2 is being rolled out in us-east-1 (and a few other regions) currently.
soon, it will become the default engine athena uses worldwide.

there are some sql-parsing incompatibilities between the two versions when tables are
joined like this

from table1
join table2 using (col1, col2)

in version 1, it was necessary to specify col1 as either table1.col1 or table2.col1
in the where clause or select clause. however in version 2, that generates an error

to ensure compatibility across both versions, the simplest thing to do is to avoid specifying
the join as using (col1, col2)

this pattern is currently used in fetching the database metadata when executing dbt docs generate, so
I was not able to build docs against athena running on execution engine version 2.

with the change, dbt docs generate works successfully across both engine versions.

limitation with athena execution engine version 1

information.colunms does not seem to contain information about views. This seems to have been fixed in engine v2

execution engine 2 is being rolled out in us-east-1 (and a few other regions) currently.
soon, it will become the default engine athena uses worldwide.

there are some sql-parsing incompatibilities between the two versions when tables are
joined like this

```
form table1
join table2 using (col1, col2)
```

in version 1, it was necessary to specify `col1` as either `table1.col1` or `table2.col1`
in the where clause or select clause. however in version 2, that generates an error

to ensure compatibility across both versions, the simplest thing to do is to avoid specifying
the join as `using (col1, col2)`

this pattern is currently used in fetching the database metadata when executing `dbt docs generate`, so
I was not able to build docs against athena running on execution engine version 2.

with the change, `dbt docs generate` works successfully across both engine versions.
@stumelius
Copy link

stumelius commented Dec 14, 2020

Hi @haleemur ,

I tried your updated catalog script with both dbt 0.17.2 and 0.18.1 and managed to successfully generate the docs. However, the docs are missing columns that are not specified in schema.yml files. And the columns that are defined in the schema files are missing column types if the model is a view. For materialized models the column types are shown correctly. There's probably something wrong with combining the catalog data with the data from schema.yml files. Do you happen to know anything about this?

EDIT: I checked the generated manifest.json and it only contained my seed nodes. Further investigation revealed that

select
*
from information_schema.columns

does not contain any of my view models. The views and their columns are available in the Glue console though, so the metadata exists but not in information_schema.columns...

@haleemur
Copy link
Author

@smomni yes, i noticed that as well & had made a note of it at the end of my pull request's message

limitation with athena execution engine version 1
information.columns does not seem to contain information about views. This seems to have been fixed in engine v2

this PR does not attempt to fix that issue as it is a bug with athena, which is resolved in v2. I had considered looking up the glue catalog directly, but that adds additional complexity, the need for which will go away as aws rolls out the newer engine across more regions.

thank you for testing that this branch works with dbt versions 0.17.2 & 0.18.1. that has been sitting on the back of my mind for a while.

@haleemur haleemur closed this Nov 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants