This repository contains training and evaluation scripts for the VC2L framework, supporting three tasks: AnyCIR, SeqCIR, and CSR. The implementation builds upon the OpenCLIP project.
.
├── README.md # Project overview and instructions
├── gen_seek.py # Script to generate seek maps from JSONL
├── ppt_val.txt # Validation split for the CSR task
├── test_AnyCIR.py # Evaluation script for AnyCIR
├── test_CSR.py # Evaluation script for CSR
├── test_SeqCIR.py # Evaluation script for SeqCIR
├── unifont-9.0.06.hex # Font file (text rendering support)
├── unifont_upper-9.0.06.hex # Font file (text rendering support)
├── open_clip/ # Code adapted from OpenCLIP
└── training/ # Training code
MMC4-core 20 epoches pretrained checkpoint can be found here.
-
Download MMC4 from Hugging Face: https://huggingface.co/datasets/jmhessel/mmc4-core-ff
-
Convert the annotation file to JSONL format. Example entry:
{
"text": [
"26/02/2015… Canon PowerShot ELPH 160 PDF User Manual / Owner’s Manual / User Guide offers information and instructions how to operate the PowerShot ELPH 160..."
],
"img": [
[
{
"image_name": "15edefb1780c.jpg",
"matched_text_index": 3,
"matched_sim": 0.259
},
{
"image_name": "8842c801adbe.jpg",
"matched_text_index": 6,
"matched_sim": 0.304
}
],
[]
]
}- Generate the seek map:
python3 gen_seek.py path_to_jsonl_file-
Download OBELICS from Hugging Face: https://huggingface.co/datasets/HuggingFaceM4/OBELICS
-
Convert the annotation file to JSONL format. Example entry:
{
"text": ["Jean-Paul Sartre and Simone de Beauvoir at the Balzac Memorial...", "..."],
"img": [
["7b0cc10f1183..."],
["b991ec778d5e..."],
[],
[]
],
"num_images": 3
}- Generate the seek map:
python3 gen_seek.py path_to_jsonl_file- Download the Slide1M dataset from Stanford: https://exhibits.stanford.edu/data/catalog/mv327tb8364
Install the required packages:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install open_clip_torch wandb opencv-python orjson nlpaug pathlibTo start training with multi-GPU support:
torchrun --nproc_per_node ${num_gpu} \
--master_port 23456 \
-m training.main_v2cl \
--lr 1e-4 \
--name ${exp_name} \
--epochs 20 \
--train-data ${dataset_name} \
--dataset-type vc2ldataset \
--csv_root '' \
--save-frequency 1 \
--batch-size 32 \
--accum-freq 2 \
--precision amp \
--workers 12 \
--model ViT-B-16-448 \
--zeroshot-frequency 100 \
--pretrained ${pretrained_ckpt} \
--report-to wandb \
--torchcompile \
--drop_rate 0.4 \
--aug_text 0.4 \
--long_range 0 \
--wandb-project-name ${wandb_name}Replace placeholders such as ${num_gpu}, ${exp_name}, ${dataset_name}, etc., with your actual configurations.
In test_AnyCIR.py, update the following variables:
OBLICES_PATH: Path to the OBELICS datasetOBLICES_SEEK_MAP_PATH: Path to the OBELICS seek mapckpt_list: List of trained model checkpoints to evaluate
Run the script:
python test_AnyCIR.pyIn test_SeqCIR.py, update the following variables:
OBLICES_PATH: Path to the OBELICS datasetOBLICES_SEEK_MAP_PATH: Path to the OBELICS seek mapckpt_list: List of trained model checkpoints to evaluate
Run the script:
python test_SeqCIR.pyIn test_CSR.py, update the following variables:
DATASET_ROOT: Path to the CSR datasetckpt_list: List of trained model checkpoints to evaluate
Run the script:
python test_CSR.py