-
Notifications
You must be signed in to change notification settings - Fork 0
Scripts for traning models on idun #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kluge7
wants to merge
21
commits into
main
Choose a base branch
from
13-traning-on-idun
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
4329e56
adding basic slurm file for job submission
Senja20 be4ce19
updated the ci installation of requirenmetns to use dependencies from…
Senja20 a1480e0
✨ feat: Push the model to the hub
Senja20 3b4ca9f
📌 update the requirements.txt
Senja20 daaa619
➖ simplify requirements.txt by removing unused stuff
Senja20 a241df0
🔥 remove hugging face
Senja20 bcff411
feat: update slurm file
Senja20 b0604d3
✨ using the pt format for model storage
Senja20 80648c9
Merge branch '13-traning-on-idun' of github.com:vortexntnu/vortex-ima…
Senja20 c1d49c3
🔧 remove redundent steps from slurm file
Senja20 3f545ba
✨ feat: Update YOLO model training parameters
Senja20 ad41066
➖ Update requirements.txt to remove unused dependencies
Senja20 79c728d
➖ Update requirements.txt to remove unused dependencies
Senja20 ade6626
🔥 Update Job.slurm to install protobuf package
Senja20 793a882
➕ Update protobuf package version in requirements.txt
Senja20 4cfe24f
✨ feat: Enhance Job.slurm for improved environment setup and package …
Senja20 0a28975
added yolo roboflow training script
vortexuser d07aa1a
unet training script
vortexuser 7a7cfaa
update: added correct account name and time
VegraD 5b186db
Delete .github/workflows/pylint.yml
kluge7 89102ab
Delete .gitignore
kluge7 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| #!/bin/bash | ||
| #SBATCH --partition=GPUQ | ||
| #SBATCH --account=ie-idi | ||
| #SBATCH --time=999:99:99 | ||
| #SBATCH --nodes=1 | ||
| #SBATCH --ntasks-per-node=4 | ||
| #SBATCH --gres=gpu:a100:4 | ||
| #SBATCH --constraint="gpu40g|gpu80g|gpu32g" | ||
| #SBATCH --job-name="vortex-img-process" | ||
| #SBATCH --output=vortex_img_process_log.out | ||
| #SBATCH --mem=32G | ||
|
|
||
| export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/cluster/apps/eb/software/Python/3.10.4-GCCcore-11.3.0/lib/ | ||
|
|
||
| set -e | ||
|
|
||
| module purge | ||
| module --ignore_cache load foss/2022a | ||
| module --ignore_cache load Python/3.10.4-GCCcore-11.3.0 | ||
|
|
||
| pip cache purge | ||
|
|
||
| # makes sure that the pip is up to date | ||
| python3 -m pip install --upgrade pip | ||
|
|
||
| # Create a temporary virtual environment | ||
| VENV_DIR=$(mktemp -d -t env-repaint-XXXXXXXXXX) | ||
| python3 -m venv $VENV_DIR | ||
| source $VENV_DIR/bin/activate | ||
|
|
||
| pip install --upgrade pip | ||
|
|
||
| # install the required packages | ||
| pip install -r requirements.txt | ||
| #pip install pyyaml # used to read the configuration file | ||
| #pip install blobfile # install blobfile to download the dataset | ||
| #pip install kagglehub # install kagglehub to download the dataset | ||
| pip install --force-reinstall torch -U | ||
| pip install torchvision torchaudio | ||
| #pip install diffusers transformers accelerate --user | ||
|
|
||
| # Mixing expandable_segments:True with max_split_size doesn't make sense because the expandable segment is the size of RAM and so it could never be split with max_split_size. | ||
| # export PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True,max_split_size_mb:128" | ||
| export PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True" | ||
|
|
||
| python3 train.py | ||
|
|
||
| # Deactivate and remove the virtual environment | ||
| deactivate | ||
| rm -rf $VENV_DIR |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| huggingface_hub==0.23.3 | ||
| numpy==1.26.4 | ||
| opencv_contrib_python==4.9.0.80 | ||
| opencv_python==4.9.0.80 | ||
| pafy==0.5.5 | ||
| python-dotenv==1.0.1 | ||
| roboflow==1.1.24 | ||
| torch==2.2.1 | ||
| protobuf==4.24.0 | ||
| ultralytics==8.0.196 |
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| #!/bin/bash | ||
| #SBATCH --partition=GPUQ | ||
| #SBATCH --account=ie-idi | ||
| #SBATCH --time=999:99:99 | ||
| #SBATCH --nodes=1 | ||
| #SBATCH --ntasks-per-node=4 | ||
| #SBATCH --gres=gpu:a100:4 | ||
| #SBATCH --constraint="gpu40g|gpu80g|gpu32g" | ||
| #SBATCH --job-name="vortex-img-process" | ||
| #SBATCH --output=vortex_img_process_log.out | ||
| #SBATCH --mem=32G | ||
|
|
||
| export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/cluster/apps/eb/software/Python/3.10.4-GCCcore-11.3.0/lib/ | ||
|
|
||
| set -e | ||
|
|
||
| module purge | ||
| module --ignore_cache load foss/2022a | ||
| module --ignore_cache load Python/3.10.4-GCCcore-11.3.0 | ||
|
|
||
| pip cache purge | ||
|
|
||
| # makes sure that the pip is up to date | ||
| python3 -m pip install --upgrade pip | ||
|
|
||
| # Create a temporary virtual environment | ||
| VENV_DIR=$(mktemp -d -t env-repaint-XXXXXXXXXX) | ||
| python3 -m venv $VENV_DIR | ||
| source $VENV_DIR/bin/activate | ||
|
|
||
| pip install --upgrade pip | ||
|
|
||
| # install the required packages | ||
| pip install -r requirements.txt | ||
| #pip install pyyaml # used to read the configuration file | ||
| #pip install blobfile # install blobfile to download the dataset | ||
| #pip install kagglehub # install kagglehub to download the dataset | ||
| pip install --force-reinstall torch -U | ||
| pip install torchvision torchaudio | ||
| #pip install diffusers transformers accelerate --user | ||
|
|
||
| # Mixing expandable_segments:True with max_split_size doesn't make sense because the expandable segment is the size of RAM and so it could never be split with max_split_size. | ||
| # export PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True,max_split_size_mb:128" | ||
| export PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True" | ||
|
|
||
| python3 train.py | ||
|
|
||
| # Deactivate and remove the virtual environment | ||
| deactivate | ||
| rm -rf $VENV_DIR | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid SLURM time format '999:99:99'. The time format should be DD-HH:MM:SS, HH:MM:SS, or MM:SS with valid values (e.g., hours 0-23, minutes/seconds 0-59).