The main setup you'll need for this lab is:
- Python >= 3.10
- Python packages:
- FastAPI
- uvicorn
- Airflow
A detailed list of requirements can be found ...
The lab was tested with Python 3.10.12, on the WSL environment of a Windows 11 PC.
- Install docker engine in your computer.
- For linux you can install this directly. Instructions can be found here
- For Windows or MacOS install docker desktop (this can also be installed in Linux, but I prefer just installing the engine).
- If using docker desktop, start it once it's installed.
- Run the airflow containers via docker compose.
- The instructions for how docker can set up Airflow are all in a
docker-compose.yaml
yaml you can find in the lab's repo. - Alternatively you can find the latest
docker-compose.yaml
for Airflow here; E.g.curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.10.3/docker-compose.yaml'
- Open a terminal in the same location as
docker-compose.yaml
and run the commanddocker compose up
- The instructions for how docker can set up Airflow are all in a
There have been a few bugs with this approach and therefore it is not recommended anymore!
The airflow standalone we will be using can be run only through WSL, so we'll need to perform all the installations in this WSL environment.
- Install WSL.
- In Windows 11 you can do so by opening powershell and typing
wsl --install
. - In older distributions the process might be slightly different.
- More info can be found here.
- In Windows 11 you can do so by opening powershell and typing
- Start a WSL terminal.
- Inside the WSL terminal, follow the same installation instructions as the Linux section.
- This will create a virtual environment and install the necessary libraries.
- Make sure you have a working installation of python
- I don't recommend using distributions like anaconda
- The easiest way is to open a terminal and try to launch the interpreter (usually the command to do so is
python
orpython3
)
- [Recommended] Create a virtual environment for our lab
- This is cleaner than just installing packages to the system's default interpreter
- You can do this by typing
python3 -m venv ml-env
. This will create a directory calledml-env
which will host your virtual environment. - To activate the virtual environment type
source ml-env/bin/activate
. - Once activated, you will use and install packages to the virtual environment and not the system interpreter.
- It's a good practice to upgrade the version of pip (so that it can find newer versions of libraries). Do this with
pip install --upgrade pip
- Install necessary packages.
- You can find all the packages along with their versions [here]. Download this file (or clone the whole repo).
- To install all packages you can run
pip install -r requirements.txt
- You can find a dummy FastAPI script to test that everything is working ...
- Use the uvicorn command to launch the server. Syntax is
uvicorn script_name:app_name
. E.g.uvicorn dummy_api:app
- Open a browser and type
localhost:8000
. You should see the root page of your API
- If you're using docker:
- Open a terminal and go to the location of
docker-compose.yaml
- Run
docker compose up
and wait a bit until it says that the webserver is runnning - In this case login with the default credentials: username='airflow', password='airflow'.
- Open a terminal and go to the location of
- If using the standalone:
- If properly installed, you should be able to run airflow by typing
airflow standalone
. This command - This will start up all airflow components (i.e. the scheduler, the webserver, etc.)
- The username and password for airflow will be displayed in the terminal.
- If properly installed, you should be able to run airflow by typing
- Open a browser and type
localhost:8080
. You should see the Airflow UI - Do not close the terminal as long as you want to run airflow.
- To stop simply press CTRL+C in the terminal you launched the standalone.
- You can find a dummy dag to test your installation.
- There is a default location where you need to place DAG files in order to run them.
- In the docker installation this should be inside the same directory as the
docker-compose.yaml
- In the standalone, this is controlled by the environment variable
AIRFLOW_HOME
. By default, it is in~/airflow
.
- In the docker installation this should be inside the same directory as the
- Inside this, you can create a directory called
dags
, under which you can place your DAGs.
- The DAG should appear in the UI shortly after.
We will use the pycharm IDE for this lab, which can be found in the following link: Download PyCharm and install.
Feel free to use any other IDE that you are more comfortable with.
It will be helpful to set up git in your PC so that you can easily clone repos.