Skip to content

Commit

Permalink
Merge pull request #17 from kevinsunny1996/task/comment_schedule
Browse files Browse the repository at this point in the history
Updated readme and commented schedule
  • Loading branch information
kevinsunny1996 authored May 10, 2024
2 parents 0754985 + f578dee commit d2180b5
Show file tree
Hide file tree
Showing 5 changed files with 11 additions and 3 deletions.
Binary file removed Airflow_dag_steps
Binary file not shown.
4 changes: 4 additions & 0 deletions DAG Flow Diagram:Zone.Identifier
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=https://excalidraw.com/
HostUrl=https://excalidraw.com/
Binary file added DAG_Flow_Diagram
Binary file not shown.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Each table will contain on an average `40 rows` of gaming related data so close
Flow Diagram
=============

![Airflow DAG Tasks](Airflow_dag_steps)
![Airflow DAG Tasks](DAG_Flow_Diagram)

Salient Features
================
Expand All @@ -40,6 +40,8 @@ This project created using astro cli contains the following parts:
- `rawg_api_extractor_dag`: This DAG walks through the EL (Extract And Load with slight transforms of flattening json data and enforcing datatype restrictions on dataframe columns) process of extracting data from RAWG API and loading it into Bigquery.

- The pipeline has the following sections :
- #### Check Hibernation
- `hibernation_check`: This step skips if the dag run is scheduled for a time closer to the hibernation schedule as we use dev deployment of Astro
- #### Extract Section:
- `get_rawg_api_game_ids`: Fetches a list of Game ID's. Uses the following parameters:
- `page_size`: How many results can be shown in a single call.
Expand Down
6 changes: 4 additions & 2 deletions dags/rawg_api_extractor_dag.py
Original file line number Diff line number Diff line change
Expand Up @@ -315,8 +315,10 @@
default_args=default_args,
description='DAG to fetch RAWG API data from games/ endpoint, convert the JSON to CSV and upload to GCS and then load it in Bigquery',
# schedule=None,
schedule_interval='*/3 * * * *',
start_date=datetime(2023, 9, 1),
# Commenting out interval as load is done in Bigquery
# schedule_interval='*/3 * * * *',
# To avoid DAG from triggering when it wakes up from hibernation at 8 UTC
start_date=datetime(2023, 9, 1, 8, 2),
tags=['rawg_api_elt'],
catchup=False
)
Expand Down

0 comments on commit d2180b5

Please sign in to comment.