This repository contains the Dockerized ML project for the MRPC task using the DistilBERT model. Instructions are provided for setting up and running the project automatically or manually both locally using Docker Desktop and on Docker Playground .
- Docker (Installation Guide)
- Git (Installation Guide)
The repository includes two scripts for automating the setup process:
docker_setup_desktop_linux.sh
for Linux usersdocker_setup_desktop_windows.ps1
for Windows users
These scripts handle cloning the repository, building the Docker image, and running the Docker container.
The base folder of the GitHub repository includes:
README.md
: Documentation file with setup instructions.requirements.txt
: List of Python packages required for the project.start.sh
: Script to start the training and TensorBoard logging.docker_playground/
: Directory with files necessary for running the project in Docker Playground.lightning_logs/
: Directory where TensorBoard logs will be stored.DistilBERT_MRPC_Script.py
: Python script for training the DistilBERT model.docker_setup_desktop_linux.sh
: Bash script to automate the setup process on Linux.docker_setup_desktop_windows.ps1
: PowerShell script to automate the setup process on Windows.Dockerfile
: Configuration file for creating the Docker image.
-
Choose the appropriate
docker_setup_desktop_[linux or windows].sh
script based on your operating system and download it directly from github by:
a) Click the chosen shell/powershell script in repository main folder
b) Navigate to the upper right corner below your account symbol + open "..." menu ("More file actions")
c) Chose Download to only load the selected setup script. -
Open a terminal (PowerShell for Windows, Terminal for Linux).
-
Make sure Docker is running
-
Navigate to the directory where you want to clone the repository and place the setup script in this folder.
-
Execute the setup script:
- For Linux:
bash docker_setup_desktop_linux.sh
- For Windows:
Right-click ondocker_setup_desktop_windows.ps1
file and select Run with PowerShell.
- For Linux:
-
Follow the prompts in the terminal to complete the setup.
-
After running the setup script, the Docker container will start, and you can access TensorBoard at http://localhost:6006 to monitor the training progress.
Ensure the port 6006 is not used by another service and is not blocked by your firewall (For instructions on how to change the tensoboard logging port pls refer to the Troubleshooting section below).
- Open a terminal.
- Execute the following commands:
a) Clone the repositoryb) Navigate to the repository directorygit clone https://github.com/digwit678/Project_2_Docker.git
c) Build the Docker imagecd Project_2_Docker
d) Run the Docker containerdocker build -t project2_docker .
e) Access TensorBoard at http://localhost:6006docker run -p 6006:6006 project2_docker
-
Docker Build Fails: Ensure Docker is running and you have internet connectivity.
-
TensorBoard Not Accessible: Check Docker container status and port mapping.
-
Change TensorBoard Port: If port 6006 cannot be used, update the port mapping in the Dockerfile (docker run -p 6006:6006 project2_docker) and start.sh (tensorboard --logdir=/usr/src/app/lightning_logs --port=6006 --bind_all &) to use a different port.
-
Error when running automated (Windows) powershell setup script: '/docker_setup_desktop_windows.ps1' is not recognized as an internal or external command
a) Check Execution Policy
Get-ExecutionPolicy
If the policy is set to Restricted, you will need to change it to allow script execution.
b) Change Execution Policy
Set-ExecutionPolicy RemoteSigned
For Docker Playground, I have optimized the setup to work within the resource constraints typically found in such environments. The docker_playground
directory contains only the essential files needed to run the training script on the MRPC task using the DistilBERT model.
msr_paraphrase_test.txt
andmsr_paraphrase_train.txt
: These text files contain the MRPC dataset used for training and evaluating the model.requirements.txt
: A list of Python packages required for running the project. This file has been optimized to exclude unnecessary packages to save memory.start.sh
: A shell script that is used to start TensorBoard and execute the Python training script.Task_3_DistilBERT_MRPC_Script.py
: The Python script that conducts the training of the DistilBERT model. It has been adapted to work with local text files instead of using the datasets library.lightning_logs
: A directory that TensorBoard uses to log training progress.docker_setup_playground.sh
: A shell script that automates the process of setting up and running the Docker container in Docker Playground.Dockerfile
: A Dockerfile configured specifically for Docker Playground, with adjustments made to work within the platform's resource limitations.
Visit Docker Playground and start a session.
Chose either to set it up automatically by using the setup script or follow the steps in the manual setup process.
- Drag and drop
docker_setup_playground.sh
found inProject_2_Docker/docker_playground/
directory into the docker playground shell to upload it - Run the automated setup script in the current sessions shell:
sh docker_setup_playground.sh
- Clone the repository and navigate to the
docker_playground
directory:git clone https://github.com/digwit678/Project_2_Docker.git cd Project_2_Docker/docker_playground
- Build the Docker image
docker build -t project2_docker .
- Run the Docker container
docker run -p 6006:6006 project2_docker