Skip to content

Commit

Permalink
Refactor build / dist to use pyproject.toml (#909)
Browse files Browse the repository at this point in the history
* Move training -> open_clip_train since it's being installed as import package in site-packages

* Remove .gitignore from train package

* Update root gitignore

* Update training module name to open_clip_train

* switch from setup.py -> pyproject.toml

* open_clip.__version__

* make [test] depend on [training]

* pip install .[test] for CI

* Change references training.main -> open_clip_train.main
  • Loading branch information
rwightman authored Jul 4, 2024
1 parent 37b2c6b commit 2aebc88
Show file tree
Hide file tree
Showing 38 changed files with 127 additions and 112 deletions.
4 changes: 1 addition & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,9 +68,7 @@ jobs:
run: |
python3 -m venv .env
source .env/bin/activate
make install
make install-test
make install-training
pip install -e .[test]
- name: Prepare test data
run: |
source .env/bin/activate
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,5 +33,5 @@ jobs:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
run: |
python setup.py sdist bdist_wheel
python -m build
twine upload dist/*
6 changes: 3 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
logs/
wandb/
**/logs/
**/wandb/
models/
features/
results/
Expand Down Expand Up @@ -150,4 +150,4 @@ src/debug
core.*

# Allow
!src/evaluation/misc/results_dbs/*
!src/evaluation/misc/results_dbs/*
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ Running regression tests against a specific git revision or tag:
### Sample single-process running code:
```bash
python -m training.main \
python -m open_clip_train.main \
--save-frequency 1 \
--zeroshot-frequency 1 \
--report-to tensorboard \
Expand Down Expand Up @@ -234,7 +234,7 @@ a job on a node of 4 GPUs:
```bash
cd open_clip/src
torchrun --nproc_per_node 4 -m training.main \
torchrun --nproc_per_node 4 -m open_clip_train.main \
--train-data '/data/cc12m/cc12m-train-{0000..2175}.tar' \
--train-num-samples 10968539 \
--dataset-type webdataset \
Expand All @@ -253,7 +253,7 @@ of nodes and host node.
cd open_clip/src
torchrun --nproc_per_node=4 \
--rdzv_endpoint=$HOSTE_NODE_ADDR \
-m training.main \
-m open_clip_train.main \
--train-data '/data/cc12m/cc12m-train-{0000..2175}.tar' \
--train-num-samples 10968539 \
--dataset-type webdataset \
Expand Down Expand Up @@ -289,7 +289,7 @@ export MASTER_ADDR=$master_addr
cd /shared/open_clip
export PYTHONPATH="$PYTHONPATH:$PWD/src"
srun --cpu_bind=v --accel-bind=gn python -u src/training/main.py \
srun --cpu_bind=v --accel-bind=gn python -u src/open_clip_train/main.py \
--save-frequency 1 \
--report-to tensorboard \
--train-data="/data/LAION-400M/{00000..41455}.tar" \
Expand All @@ -307,7 +307,7 @@ srun --cpu_bind=v --accel-bind=gn python -u src/training/main.py \
### Resuming from a checkpoint:
```bash
python -m training.main \
python -m open_clip_train.main \
--train-data="/path/to/train_data.csv" \
--val-data="/path/to/validation_data.csv" \
--resume /path/to/checkpoints/epoch_K.pt
Expand Down Expand Up @@ -376,7 +376,7 @@ pd.DataFrame.from_dict(future_df).to_csv(
```
This should create a csv dataset that one can use to fine-tune coca with open_clip
```bash
python -m training.main \
python -m open_clip_train.main \
--dataset-type "csv" \
--train-data "path/to/data/dir/train2014.csv" \
--warmup 1000 \
Expand All @@ -392,7 +392,7 @@ python -m training.main \
--log-every-n-steps 100
```

This is a general setting, open_clip has very parameters that can be set, ```python -m training.main --help``` should show them. The only relevant change compared to pre-training are the two arguments
This is a general setting, open_clip has very parameters that can be set, ```python -m open_clip_train.main --help``` should show them. The only relevant change compared to pre-training are the two arguments

```bash
--coca-contrastive-loss-weight 0
Expand All @@ -404,7 +404,7 @@ which make the model only train the generative side.

If you wish to use different language models as the text encoder for CLIP you can do so by using one of the Hugging Face model configs in ```src/open_clip/model_configs``` and passing in it's tokenizer as the ```--model``` and ```--hf-tokenizer-name``` parameters respectively. Currently we only support RoBERTa ("test-roberta" config), however adding new models should be trivial. You can also determine how many layers, from the end, to leave unfrozen with the ```--lock-text-unlocked-layers``` parameter. Here's an example command to train CLIP with the RoBERTa LM that has it's last 10 layers unfrozen:
```bash
python -m training.main \
python -m open_clip_train.main \
--train-data="pipe:aws s3 cp s3://s-mas/cc3m/{00000..00329}.tar -" \
--train-num-samples 3000000 \
--val-data="pipe:aws s3 cp s3://s-mas/cc3m/{00330..00331}.tar -" \
Expand Down Expand Up @@ -453,7 +453,7 @@ We recommend https://github.com/LAION-AI/CLIP_benchmark#how-to-use for systemati
### Evaluating local checkpoint:
```bash
python -m training.main \
python -m open_clip_train.main \
--val-data="/path/to/validation_data.csv" \
--model RN101 \
--pretrained /path/to/checkpoints/epoch_K.pt
Expand All @@ -462,7 +462,7 @@ python -m training.main \
### Evaluating hosted pretrained checkpoint on ImageNet zero-shot prediction:
```bash
python -m training.main \
python -m open_clip_train.main \
--imagenet-val /path/to/imagenet/validation \
--model ViT-B-32-quickgelu \
--pretrained laion400m_e32
Expand Down
2 changes: 1 addition & 1 deletion docs/script_examples/clipa/vit_b16/i50_t16_finetune.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
torchrun --nproc_per_node 8 -m training.main \
torchrun --nproc_per_node 8 -m open_clip_train.main \
--save-frequency 1 \
--save-most-recent \
--zeroshot-frequency 1 \
Expand Down
2 changes: 1 addition & 1 deletion docs/script_examples/clipa/vit_b16/i50_t16_pretrain.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
torchrun --nproc_per_node 8 -m training.main \
torchrun --nproc_per_node 8 -m open_clip_train.main \
--save-frequency 1 \
--save-most-recent \
--zeroshot-frequency 1 \
Expand Down
2 changes: 1 addition & 1 deletion docs/script_examples/clipa/vit_l16/i17_t16_finetune.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
torchrun --nproc_per_node 8 -m training.main \
torchrun --nproc_per_node 8 -m open_clip_train.main \
--save-frequency 1 \
--save-most-recent \
--zeroshot-frequency 1 \
Expand Down
2 changes: 1 addition & 1 deletion docs/script_examples/clipa/vit_l16/i17_t16_pretrain.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
torchrun --nproc_per_node 8 -m training.main \
torchrun --nproc_per_node 8 -m open_clip_train.main \
--save-frequency 1 \
--save-most-recent \
--zeroshot-frequency 1 \
Expand Down
2 changes: 1 addition & 1 deletion docs/script_examples/clipa/vit_l16/i37_t8_finetune.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
torchrun --nproc_per_node 8 -m training.main \
torchrun --nproc_per_node 8 -m open_clip_train.main \
--save-frequency 1 \
--save-most-recent \
--zeroshot-frequency 1 \
Expand Down
2 changes: 1 addition & 1 deletion docs/script_examples/clipa/vit_l16/i37_t8_pretrain.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
torchrun --nproc_per_node 8 -m training.main \
torchrun --nproc_per_node 8 -m open_clip_train.main \
--save-frequency 1 \
--save-most-recent \
--zeroshot-frequency 1 \
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# have not been tested. use it at your own discretion
# the original experiment was run on tpu v3-256.
# this example script assumes 8 gpus, each with huge memory. Tune batchsize, warmup, and lr accordingly if you have different machine setups.
torchrun --nproc_per_node 8 -m training.main \
torchrun --nproc_per_node 8 -m open_clip_train.main \
--save-frequency 1 \
--save-most-recent \
--zeroshot-frequency 1 \
Expand Down
2 changes: 1 addition & 1 deletion docs/script_examples/clipav2/vit_h14/i50_t8_pretrain.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# have not been tested. use it at your own discretion
# the original experiment was run on tpu v3-256.
# this example script assumes 8 gpus, each with huge memory. Tune batchsize, warmup, and lr accordingly if you have different machine setups.
torchrun --nproc_per_node 8 -m training.main \
torchrun --nproc_per_node 8 -m open_clip_train.main \
--save-frequency 1 \
--save-most-recent \
--zeroshot-frequency 1 \
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# have not been tested. use it at your own discretion
# the original experiment was run on tpu v3-256.
# this example script assumes 8 gpus, each with huge memory. Tune batchsize, warmup, and lr accordingly if you have different machine setups.
torchrun --nproc_per_node 8 -m training.main \
torchrun --nproc_per_node 8 -m open_clip_train.main \
--save-frequency 1 \
--save-most-recent \
--zeroshot-frequency 1 \
Expand Down
2 changes: 1 addition & 1 deletion docs/script_examples/stability_example.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ export PYTHONPATH="$PYTHONPATH:/admin/home-mitchellw/open_clip/src"

EXP_NAME="test-B-32-laion5b-lr1e-3-bs90k"

srun --comment laion --cpu_bind=v --accel-bind=gn python -m training.main \
srun --comment laion --cpu_bind=v --accel-bind=gn python -m open_clip_train.main \
--save-frequency 1 \
--train-data="pipe:aws s3 cp s3://s-datasets/laion5b/{laion2B-data/{000000..231349}.tar,laion2B-multi-data/{000000..226687}.tar,laion1B-nolang-data/{000000..127231}.tar} -" \
--train-num-samples 135646078 \
Expand Down
79 changes: 79 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
[build-system]
requires = ["pdm-backend"]
build-backend = "pdm.backend"

[project]
name = "open_clip_torch"
# NOTE for full list of authors see https://github.com/mlfoundations/open_clip?tab=readme-ov-file#citing
# below covers most active / recent maintainers
authors = [
{name = "Ross Wightman", email = "[email protected]"},
{name = "Gabriel Ilharco"},
{name = "Mitchell Wortsman"},
{name = "Romain Beaumont"},
]
description = "Open reproduction of consastive language-image pretraining (CLIP) and related."
readme = "README.md"
requires-python = ">=3.8"
keywords = ["pytorch", "clip", "image-text", "language-image", "multimodal"]
license = {text = "MIT"}
classifiers = [
'Development Status :: 4 - Beta',
'Intended Audience :: Education',
'Intended Audience :: Science/Research',
'License :: OSI Approved :: MIT License',
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: 3.9',
'Programming Language :: Python :: 3.10',
'Programming Language :: Python :: 3.11',
'Programming Language :: Python :: 3.12',
'Topic :: Scientific/Engineering',
'Topic :: Scientific/Engineering :: Artificial Intelligence',
'Topic :: Software Development',
'Topic :: Software Development :: Libraries',
'Topic :: Software Development :: Libraries :: Python Modules',
]
dependencies = [
'torch>=1.9.0',
'torchvision',
'regex',
'ftfy',
'tqdm',
'huggingface-hub',
'timm',
]
dynamic = ["version"]

[project.optional-dependencies]
training = [
'torch>=2.0',
'webdataset>=0.2.5',
'pandas',
'transformers[sentencepiece]',
'timm>=1.0.7',
'fsspec',
]
test = [
'pytest-split',
'pytest',
'open_clip_torch[training]'
]

[project.urls]
homepage = "https://github.com/mlfoundations/open_clip"
repository = "https://github.com/mlfoundations/open_clip"

[tool.pdm.version]
source = "file"
path = "src/open_clip/version.py"

[tool.pdm.build]
excludes = ["./**/.git", "./**/logs/*"]
package-dir = "src"
includes = ["src/open_clip", "src/open_clip_train"]

[tool.pytest.ini_options]
testpaths = ['tests']
markers = [
'regression_test'
]
2 changes: 1 addition & 1 deletion scripts/clipav1_vit_l16_i37_t8.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# eval on a single gpu
CUDA_VISIBLE_DEVICES=2 TORCH_CUDNN_V8_API_ENABLED=1 TFDS_PREFETCH_SIZE=8192 python3 -m training.main \
CUDA_VISIBLE_DEVICES=2 TORCH_CUDNN_V8_API_ENABLED=1 TFDS_PREFETCH_SIZE=8192 python3 -m open_clip_train.main \
--model ViT-L-16-CL32-GAP \
--pretrained "/path/to/clipa_vit_l16_i37_t8.pt" \
--seed 0 \
Expand Down
2 changes: 1 addition & 1 deletion scripts/clipav2_vit_h14_i84_224_336_cl32_gap_datacomp1b.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
CUDA_VISIBLE_DEVICES=1 python3 -m training.main \
CUDA_VISIBLE_DEVICES=1 python3 -m open_clip_train.main \
--model ViT-H-14-CL32-GAP-BigVision \
--pretrained "/path/to/vit_h14_i84_224_336_cl32_gap_datacomp1b.pt" \
--force-image-size 336 \
Expand Down
2 changes: 1 addition & 1 deletion scripts/h14_224_32_finetune.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# 64k batchsize for 2.048e-3 lr
TORCH_CUDNN_V8_API_ENABLED=1 torchrun --nproc_per_node 8 -m training.main \
TORCH_CUDNN_V8_API_ENABLED=1 torchrun --nproc_per_node 8 -m open_clip_train.main \
--save-frequency 1 \
--save-most-recent \
--zeroshot-frequency 1 \
Expand Down
2 changes: 1 addition & 1 deletion scripts/h14_84_8_pretrain.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# 64k batchsize for 2.048e-3 lr
TORCH_CUDNN_V8_API_ENABLED=1 torchrun --nproc_per_node 8 -m training.main \
TORCH_CUDNN_V8_API_ENABLED=1 torchrun --nproc_per_node 8 -m open_clip_train.main \
--save-frequency 1 \
--save-most-recent \
--zeroshot-frequency 1 \
Expand Down
63 changes: 0 additions & 63 deletions setup.py

This file was deleted.

2 changes: 2 additions & 0 deletions src/open_clip/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
from .version import __version__

from .coca_model import CoCa
from .constants import OPENAI_DATASET_MEAN, OPENAI_DATASET_STD
from .factory import create_model, create_model_and_transforms, create_model_from_pretrained, get_tokenizer, create_loss
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
14 changes: 7 additions & 7 deletions src/training/main.py → src/open_clip_train/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,13 @@
hvd = None

from open_clip import create_model_and_transforms, trace_model, get_tokenizer, create_loss
from training.data import get_data
from training.distributed import is_master, init_distributed_device, broadcast_object
from training.logger import setup_logging
from training.params import parse_args
from training.scheduler import cosine_lr, const_lr, const_lr_cooldown
from training.train import train_one_epoch, evaluate
from training.file_utils import pt_load, check_exists, start_sync_process, remote_sync
from open_clip_train.data import get_data
from open_clip_train.distributed import is_master, init_distributed_device, broadcast_object
from open_clip_train.logger import setup_logging
from open_clip_train.params import parse_args
from open_clip_train.scheduler import cosine_lr, const_lr, const_lr_cooldown
from open_clip_train.train import train_one_epoch, evaluate
from open_clip_train.file_utils import pt_load, check_exists, start_sync_process, remote_sync


LATEST_CHECKPOINT_NAME = "epoch_latest.pt"
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit 2aebc88

Please sign in to comment.