Skip to content

Commit 3c88f06

Browse files
authored
Merge pull request #117 from dlt-hub/feat/update_readme
readme update
2 parents 846221c + be7625f commit 3c88f06

File tree

1 file changed

+119
-79
lines changed

1 file changed

+119
-79
lines changed

README.md

Lines changed: 119 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -1,131 +1,167 @@
1-
# dlt-init-openapi
2-
`dlt-init-openapi` generates [`dlt`](https://dlthub.com/docs) pipelines from OpenAPI 3.x documents/specs using the [`dlt` `rest_api` `verified source`](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api). If you do not know `dlt` or our `verified sources`, please read:
1+
# dlt-init-openapi - An OpenAPI Source Generator for the `dlt` Python Library
32

4-
* [Getting started](https://dlthub.com/docs/getting-started) to learn the `dlt` basics
5-
* [dlt rest_api](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api) to learn how our `rest_api` source works
6-
7-
> This generator does not support OpenAPI 2.x FKA Swagger. If you need to use an older document, try upgrading it to
8-
version 3 first with one of many available converters.
9-
10-
11-
## Prior work
12-
This project started as a fork of [openapi-python-client](https://github.com/openapi-generators/openapi-python-client). Pretty much all parts are heavily changed or completely replaced, but some lines of code still exist and we like to acknowledge the many good ideas we got from the original project :)
13-
14-
15-
## Support
16-
If you need support for this tool, [join our slack community](https://dlthub.com/community) and ask for help on the technical help channel. We're usually around to help you out or discuss features :)
3+
`dlt-init-openapi` generates [`dlt`](https://dlthub.com/docs) data pipelines from OpenAPI 3.x specs using the [`dlt` `rest_api` `verified source`](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api) to extract data from any REST API. If you are not familiar with `dlt` or our `verified sources`, please read:
174

5+
* [Getting started](https://dlthub.com/docs/getting-started) to learn the `dlt` basics.
6+
* [dlt rest_api](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api) to learn how our `rest_api` source works.
7+
* We also have a cool [Google Colab example](https://colab.research.google.com/drive/1MRZvguOTZj1MlkEGzjiso8lQ_wr1MJRI?usp=sharing#scrollTo=LHGxzf1Ev_yr) that demonstrates this generator.
188

199
## Features
20-
The dlt-init-openapi generates code from an OpenAPI spec that you can use to extract data from a `rest_api` into any [`destination`](https://dlthub.com/docs/dlt-ecosystem/destinations/) (e.g. Postgres, BigQuery, Redshift...) `dlt` supports.
21-
22-
Features include
23-
24-
* **[Pagination](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api#pagination) discovery** for each endpoint
25-
* **Primary key discovery** for each entity
26-
* **Endpoint relationship mapping** into `dlt` [`transformers`](https://dlthub.com/docs/general-usage/resource#process-resources-with-dlttransformer) (e.g. /users/ -> /user/{id})
27-
* **Payload JSON path [data selector](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api#data-selection) discovery** for results nested in the returned json
28-
* **[Authentication](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api#authentication)** discovery for an API
10+
The dlt-init-openapi generates code from an OpenAPI spec that you can use to extract data from a `rest_api` into any [`destination`](https://dlthub.com/docs/dlt-ecosystem/destinations/) (e.g., Postgres, BigQuery, Redshift...) that `dlt` supports. dlt-init-openapi additionally executes a set of heuristics to discover information not explicitly defined in OpenAPI specs.
2911

30-
## Setup
12+
Features include:
3113

32-
You will need Python 3.9 or higher installed, as well as pip.
14+
* **[Pagination](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api#pagination) discovery** for each endpoint.
15+
* **Primary key discovery** for each entity.
16+
* **Endpoint relationship mapping** into `dlt` [`transformers`](https://dlthub.com/docs/general-usage/resource#process-resources-with-dlttransformer) (e.g., /users/ -> /user/{id}).
17+
* **Payload JSON path [data selector](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api#data-selection) discovery** for results nested in the returned JSON.
18+
* **[Authentication](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api#authentication)** discovery for an API.
3319

34-
```console
35-
# 1. install this tool locally
36-
$ pip install dlt-init-openapi
20+
## Support
21+
If you need support for this tool or `dlt`, please [join our Slack community](https://dlthub.com/community) and ask for help on the technical help channel. We're usually around to help you out or discuss features :)
3722

38-
# 2. Show the version of the installed package to verify it worked
39-
$ dlt-init-openapi --version
40-
```
23+
## A quick example
4124

42-
## Basic Usage
25+
You will need Python 3.9 or higher installed, as well as pip. You can run `pip install dlt-init-openapi` to install the current version.
4326

44-
Let's create an example pipeline from the [PokeAPI spec](https://raw.githubusercontent.com/cliffano/pokeapi-clients/ec9a2707ef2a85f41b747d8df013e272ef650ec5/specification/pokeapi.yml). You can point to any other OpenAPI Spec instead if you like.
27+
We will create a simple example pipeline from a [PokeAPI spec](https://pokeapi.co/) in our repo. You can point to any other OpenAPI Spec instead if you prefer.
4528

4629
```console
47-
# 1.a. Run the generator with an url:
48-
$ dlt-init-openapi pokemon --url https://raw.githubusercontent.com/cliffano/pokeapi-clients/ec9a2707ef2a85f41b747d8df013e272ef650ec5/specification/pokeapi.yml
30+
# 1.a. Run the generator with a URL:
31+
$ dlt-init-openapi pokemon --url https://raw.githubusercontent.com/dlt-hub/dlt-init-openapi/devel/tests/cases/e2e_specs/pokeapi.yml --global-limit 2
4932

5033
# 1.b. If you have a local file, you can use the --path flag:
5134
$ dlt-init-openapi pokemon --path ./my_specs/pokeapi.yml
5235

53-
# 2. You can now pick the endpoints you need from the popup
36+
# 2. You can now pick both of the endpoints from the popup.
5437

55-
# 3. After selecting your pokemon endpoints and hitting Enter, your pipeline will be rendered
38+
# 3. After selecting your Pokemon endpoints and hitting Enter, your pipeline will be rendered.
5639

57-
# 4. If you have any kind of authentication on your pipeline (this example has not), open the `.dlt/secrets.toml` and provide the credentials. You can find further settings in the `.dlt/config.toml`.
40+
# 4. If you have any kind of authentication on your pipeline (this example does not), open the `.dlt/secrets.toml` and provide the credentials. You can find further settings in the `.dlt/config.toml`.
5841

59-
# 5. Go to the created pipeline folder and run your pipeline
42+
# 5. Go to the created pipeline folder and run your pipeline.
6043
$ cd pokemon-pipeline
6144
$ PROGRESS=enlighten python pipeline.py # we use enlighten for a nice progress bar :)
6245

63-
# 6. Print the pipeline info to console to see what got loaded
46+
# 6. Print the pipeline info to the console to see what got loaded.
6447
$ dlt pipeline pokemon_pipeline info
6548

66-
# 7. You can now also install streamlit to see a preview of the data
49+
# 7. You can now also install Streamlit to see a preview of the data; you should have loaded 40 Pokemons and their details.
6750
$ pip install pandas streamlit
6851
$ dlt pipeline pokemon_pipeline show
6952

70-
# 8. You can go to our docs at https://dlthub.com/docs to learn how modify the generated pipeline to load to many destinations, place schema contracts on your pipeline and many other things.
53+
# 8. You can go to our docs at https://dlthub.com/docs to learn how to modify the generated pipeline to load to many destinations, place schema contracts on your pipeline, and many other things.
54+
55+
# NOTE: We used the `--global-limit 2` CLI flag to limit the requests to the PokeAPI for this example. This way, the Pokemon collection endpoint only gets queried twice, resulting in 2 x 20 Pokemon
56+
57+
details being rendered.
7158
```
7259

7360
## What will be created?
61+
7462
When you run the `init` command above, the following files will be generated:
7563

76-
* `./pokemon-pipeline` - a folder containing the full project.
77-
* `./pokemon-pipeline/pipeline.py` - a file which you can execute to run your pipeline.
78-
* `./pokemon-pipeline/pokemon/__init__.py` - a file that contains the generated code to connect to the PokeApi, you can inspect this file and manually change it to your liking or to fix incorrectly generated results.
79-
* `./pokemon-pipeline/.dlt` - a folder with the `config.toml`. You can add your `secrets.toml` with credentials here.
80-
* `./pokemon-pipeline/rest_api` - a folder that contains the rest_api source from our verified sources.
64+
```
65+
pokemon_pipeline/
66+
├── .dlt/
67+
│ ├── config.toml # dlt config, learn more at dlthub.com/docs
68+
│ └── secrets.toml # your secrets, only needed for APIs with auth
69+
├── pokemon/
70+
│ └── __init__.py # your rest_api dictionary, learn more below
71+
├── rest_api/
72+
│ └── ... # rest_api copied from our verified sources repo
73+
├── .gitignore
74+
├── pokemon_pipeline.py # your pipeline file that you can execute
75+
├── README.md # a list of your endpoints with some additional info
76+
└── requirements.txt # the pip requirements for your pipeline
77+
```
8178

8279
> If you re-generate your pipeline, you will be prompted to continue if this folder exists. If you select yes, all generated files will be overwritten. All other files you may have created will remain in this folder.
8380
84-
## CLI commands
85-
86-
```console
87-
$ dlt-init-openapi [OPTIONS] <source_name> [ARGS]
88-
# example:
89-
$ dlt-init-openapi pokemon --path ./path/to/my_spec.yml
81+
## A closer look at your rest_api dictionary in pokemon/__init__.py
82+
83+
This file contains the configuration dictionary for the [dlt rest_api](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api) source which is the main result of running this generator. For our Pokemon example, we have used an OpenAPI 3 spec that works out of the box. The result of this dictionary depends on the quality of the spec you are using, whether the API you are querying actually adheres to this spec, and whether our heuristics manage to find the right values. You can edit this file to adapt the behavior of the dlt rest_api accordingly. Please read our [dlt rest_api](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api) docs to learn how to do this and play with our detailed [Google Colab example](https://colab.research.google.com/drive/1MRZvguOTZj1MlkEGzjiso8lQ_wr1MJRI?usp=sharing#scrollTo=LHGxzf1Ev_yr).
84+
85+
The generated dictionary will look something like this:
86+
87+
```python
88+
{
89+
"client": {
90+
"base_url": base_url,
91+
# -> the detected common paginator
92+
"paginator": {
93+
...
94+
},
95+
},
96+
# -> your two endpoints
97+
"resources": [
98+
{
99+
# -> A primary key could not be inferred from
100+
# the spec; usual suspects such as id, pokemon_id, etc.
101+
# are not defined. You can add one if you know.
102+
"name": "pokemon_list",
103+
"table_name": "pokemon",
104+
"endpoint": {
105+
# -> the results seem to be nested in { results: [...] }
106+
"data_selector": "results",
107+
"path": "/api/v2/pokemon/",
108+
},
109+
},
110+
{
111+
"name": "pokemon_read",
112+
"table_name": "pokemon",
113+
# -> A primary key *name* is assumed, as it is found in the
114+
# url.
115+
"primary_key": "name",
116+
"write_disposition": "merge",
117+
"endpoint": {
118+
"data_selector": "$",
119+
"path": "/api/v2/pokemon/{name}/",
120+
"params": {
121+
# -> your detected transformer settings
122+
# this is a child endpoint of the pokemon_list
123+
"name": {
124+
"type": "resolve",
125+
"resource": "pokemon_list",
126+
"field": "name",
127+
},
128+
},
129+
},
130+
},
131+
],
132+
}
90133
```
91134

92-
**Options**:
93-
94-
- `--version`: Print the version and exit
95-
- `--help`: Show this message and exit.
96-
97-
**Commands**:
98-
99-
- `init`: Generate a new `dlt` `rest_api` `source`
100-
101-
### `dlt-init-openapi`
102-
103-
Generate a new `dlt` `rest_api` `source`
104-
105-
**Usage**:
135+
## CLI command
106136

107137
```console
108-
$ dlt-init-openapi pokemon --path ./path/to/my_spec.yml
138+
$ dlt-init-openapi <source_name> [OPTIONS]
139+
# example:
140+
$ dlt-init-openapi pokemon --path ./path/to/my_spec.yml --no-interactive --output-path ./my_pipeline
109141
```
110142

111143
**Options**:
112144

113-
- `--url URL`: A url to read the OpenAPI JSON or YAML file from
114-
- `--path PATH`: A path to read the OpenAPI JSON or YAML file from locally
115-
- `--output-path PATH`: A path to render the output to
116-
- `--config PATH`: Path to the config file to use (see below)
145+
_The only required options are either to supply a path or a URL to a spec_
146+
147+
- `--url URL`: A URL to read the OpenAPI JSON or YAML file from.
148+
- `--path PATH`: A path to read the OpenAPI JSON or YAML file from locally.
149+
- `--output-path PATH`: A path to render the output to.
150+
- `--config PATH`: Path to the config file to use (see below).
117151
- `--no-interactive`: Skip endpoint selection and render all paths of the OpenAPI spec.
118-
- `--log-level`: Set logging level for stdout output, defaults to 20 (INFO).
152+
- `--log-level`: Set the logging level for stdout output, defaults to 20 (INFO).
119153
- `--global-limit`: Set a global limit on the generated source.
120154
- `--update-rest-api-source`: Update the locally cached rest_api verified source.
121-
- `--allow-openapi-2`: Allow to use OpenAPI v2. specs. Migration of the spec to 3.0 is recommended though.
122-
- `--version`: Show installed version of the generator.
155+
- `--allow-openapi-2`: Allows the use of OpenAPI v2. specs. Migration of the spec to 3.0 is recommended
156+
157+
for better results though.
158+
- `--version`: Show the installed version of the generator and exit.
123159
- `--help`: Show this message and exit.
124160

125161
## Config options
126162
You can pass a path to a config file with the `--config PATH` argument. To see available config values, go to https://github.com/dlt-hub/dlt-init-openapi/blob/devel/dlt_init_openapi/config.py and read the information below each field on the `Config` class.
127163

128-
The config file can be supplied as json or yaml dictionary. For example to change the package name, you can create a yaml file:
164+
The config file can be supplied as JSON or YAML dictionary. For example, to change the package name, you can create a YAML file:
129165

130166
```yaml
131167
# config.yml
@@ -141,6 +177,10 @@ $ dlt-init-openapi pokemon --url ... --config config.yml
141177
## Telemetry
142178
We track your usage of this tool similar to how we track other commands in the dlt core library. Read more about this and how to disable it here: https://dlthub.com/docs/reference/telemetry.
143179

180+
## Prior work
181+
This project started as a fork of [openapi-python-client](https://github.com/openapi-generators/openapi-python-client). Pretty much all parts are heavily changed or completely replaced, but some lines of code still exist, and we like to acknowledge the many good ideas we got from the original project :)
182+
144183
## Implementation notes
145-
* OAuth Authentication currently is not natively supported, you can supply your own
146-
* Per endpoint authentication currently is not supported by the generator, only the first globally set securityScheme will be applied. You can add your own per endpoint if you need to.
184+
* OAuth Authentication currently is not natively supported. You can supply your own.
185+
* Per endpoint authentication currently is not supported by the generator. Only the first globally set securityScheme will be applied. You can add your own per endpoint if you need to.
186+
* Basic OpenAPI 2.0 support is implemented. We recommend updating your specs at https://editor.swagger.io before using `dlt-init-openapi`.

0 commit comments

Comments
 (0)