Skip to content

Commit

Permalink
auto dataset support
Browse files Browse the repository at this point in the history
  • Loading branch information
fostiropoulos committed Jun 10, 2023
1 parent 8c495ec commit 9e513d9
Show file tree
Hide file tree
Showing 6 changed files with 102 additions and 53 deletions.
16 changes: 9 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ The code was used for the experiments and results of
**Batch-Model-Consolidation** [[arXiv]](https://openaccess.thecvf.com/content/CVPR2023/papers/Fostiropoulos_Batch_Model_Consolidation_A_Multi-Task_Model_Consolidation_Framework_CVPR_2023_paper.pdf) [[Website]](https://fostiropoulos.github.io/stream_benchmark/).
If using this code please cite:

```
```bibtex
@inproceedings{fostiropoulos2023batch,
title={Batch Model Consolidation: A Multi-Task Model Consolidation Framework},
author={Fostiropoulos, Iordanis and Zhu, Jiaye and Itti, Laurent},
Expand All @@ -13,22 +13,22 @@ If using this code please cite:
year={2023}
}
```
This repository is a benchmark of methods found in [FACIL](https://github.com/mmasana/FACIL) and [Mammoth](https://github.com/aimagelab/mammoth) combined and adapted to work with the [Stream](https://github.com/fostiropoulos/stream) dataset.
This repository is a benchmark of methods found in [FACIL](https://github.com/mmasana/FACIL) and [Mammoth](https://github.com/aimagelab/mammoth) combined and adapted to work with the [AutoDS](https://github.com/fostiropoulos/auto-dataset) dataset to evaluate methods on a long sequence of tasks.



## Install

1. Install the [Stream dataset](https://github.com/fostiropoulos/stream).
1. Install the [AutoDS dataset](https://github.com/fostiropoulos/auto-dataset).
2. `git clone https://github.com/fostiropoulos/stream_benchmark.git`
3. `cd stream_benchmark`
4. `pip install . stream_benchmark`


## Stream Feature Vectors [Download](https://drive.google.com/file/d/1insLK3FoGw-UEQUNnhzyxsql7z28lplZ/view)
## AutoDS Feature Vectors [Download](https://drive.google.com/file/d/1insLK3FoGw-UEQUNnhzyxsql7z28lplZ/view)

We use 71 datasets with extracted features from pre-trained models,
supported in the Stream dataset. [The detailed table](https://github.com/fostiropoulos/stream/blob/cvpr_release/assets/DATASET_TABLE.md).
supported in the AutoDS dataset. [The detailed table](https://github.com/fostiropoulos/auto-dataset/blob/cvpr_release/assets/DATASET_TABLE.md).

## Hyperparameters

Expand Down Expand Up @@ -57,12 +57,14 @@ For `model_name` support see below.
`{num_gpus}` is the fractional number of GPU to use.
Set this so that `{GPU usage per experiment} * {num_gpus} < 1`

## Extending

The code in [test_benchmark.py](tests/test_benchmark.py) would be a good starting point in a simple example (ignoring the mock.patching) in understanding how the benchmark can be extended.


## Methods implemented
| Description | `model_name` | File |
|:-----------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------|:-----------------------------------------------------|
| Description | `model_name` | File |
| :--------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------ | :--------------------------------------------------- |
| Continual learning via Gradient Episodic Memory. | [gem](https://arxiv.org/abs/1706.08840) | [gem.py](stream_benchmark/models/gem.py) |
| Continual learning via online EWC. | [ewc_on](https://arxiv.org/pdf/1805.06370.pdf) | [ewc_on.py](stream_benchmark/models/ewc_on.py) |
| Continual learning via MAS. | [mas](https://arxiv.org/abs/1711.09601) | [mas.py](stream_benchmark/models/mas.py) |
Expand Down
10 changes: 5 additions & 5 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
&nbsp;&nbsp;&nbsp;
<a href="https://github.com/fostiropoulos/stream_benchmark">[Code]</a>
&nbsp;&nbsp;&nbsp;
<a href="https://github.com/fostiropoulos/stream">[Dataset]</a>
<a href="https://github.com/fostiropoulos/auto-dataset">[Dataset]</a>
</p>

## Abstract
Expand Down Expand Up @@ -53,13 +53,13 @@ The parallelism of this framework enables BMC to learn long task sequences effic

![Paralleled multi-expert training framework](https://drive.google.com/uc?export=view&id=1NAswFVQtiNn6xkilUig42guGfvi-babV)

## The Stream Dataset
## Auto-Dataset

Stream dataset implements the logic for processing and managing a large sequence of datasets,
AutoDS implements the logic for processing and managing a large sequence of datasets,
and provides a method to train on interdisciplinary tasks by projecting all datasets on the same dimension,
by extracting features from pre-trained models.

See [the repository](https://github.com/fostiropoulos/stream/tree/cvpr_release) for Stream dataset installation and usages.
See [the repository](https://github.com/fostiropoulos/auto-dataset/) for dataset installation and usages.

Download the extracted features for Stream datasets [here](https://drive.google.com/file/d/1insLK3FoGw-UEQUNnhzyxsql7z28lplZ/view).

Expand All @@ -74,7 +74,7 @@ Our implementation of BMC as well as the baselines can be found [here](https://g

## Citation

```
```bibtex
@inproceedings{fostiropoulos2023batch,
title={Batch Model Consolidation: A Multi-Task Model Consolidation Framework},
author={Fostiropoulos, Iordanis and Zhu, Jiaye and Itti, Laurent},
Expand Down
6 changes: 3 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
version="1.0",
description="Stream Benchmark",
author="Iordanis Fostiropoulos",
author_email="dev@iordanis.xyz",
url="https://iordanis.xyz/",
author_email="mail@iordanis.me",
url="https://iordanis.me/",
python_requires=">3.10",
long_description=open("README.md").read(),
packages=find_packages(),
Expand All @@ -22,7 +22,7 @@
"quadprog==0.1.11",
"pandas==2.0.0",
"tabulate==0.9.0",
"stream @ git+https://github.com/fostiropoulos/stream.git",
"autods==1.0",
],
extras_require={
"dev": [
Expand Down
19 changes: 13 additions & 6 deletions stream_benchmark/datasets/aux_cifar100.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from pathlib import Path

from stream.main import Stream
from torch.utils.data import DataLoader, Dataset
from autods.main import AutoDS


class AuxDataset(Dataset):
Expand Down Expand Up @@ -40,14 +40,21 @@ def make_ds(self):
root_path = Path(self.dataset_path)

# use the full dataset as aux data, no need to split
train_ds = Stream(
root_path=root_path, datasets=["cifar100"], task_id = 0, feats_name="default", train=True
train_ds = AutoDS(
root_path=root_path,
datasets=["cifar100"],
task_id=0,
feats_name="default",
train=True,
)
test_ds = Stream(
root_path=root_path, datasets=["cifar100"], task_id = 0, feats_name="default", train=False
test_ds = AutoDS(
root_path=root_path,
datasets=["cifar100"],
task_id=0,
feats_name="default",
train=False,
)
return train_ds, test_ds


def get_data_loaders(self):
return self.train_loader, self.test_loader
6 changes: 3 additions & 3 deletions stream_benchmark/datasets/seq_stream.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import torch.nn.functional as F
from stream.main import Stream
from autods.main import AutoDS
from torch.utils.data import DataLoader
from torch.utils.data.dataset import ConcatDataset
from torchvision import transforms
Expand Down Expand Up @@ -37,7 +37,7 @@ def __init__(
self.feats_name = "default"
self.image_size = 224
self.val_image_size = 224
mock_ds: Stream = self.make_ds(task_id, True)
mock_ds: AutoDS = self.make_ds(task_id, True)
if isinstance(mock_ds.dataset, ConcatDataset):
self.dataset_len = [len(ds) for ds in mock_ds.dataset.datasets]

Expand Down Expand Up @@ -75,7 +75,7 @@ def make_ds(self, task_id, train):
if self.feats_name is None:
transform = self.transforms(train)

s = Stream(
s = AutoDS(
self.root_path,
task_id=task_id,
feats_name=self.feats_name,
Expand Down
98 changes: 69 additions & 29 deletions tests/test_benchmark.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import copy
import io
import json
import logging
import shutil
import tempfile
Expand All @@ -9,18 +10,33 @@

import numpy as np
import torch
from autods.dataset import Dataset
from autods.main import AutoDS
from autods.utils import extract
from PIL import Image
from stream.dataset import Dataset
from stream.main import Stream
from stream.utils import extract

from stream_benchmark.__main__ import train_method
from stream_benchmark.datasets.seq_stream import include_ds

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
MAKE_FEATS_BATCH_SIZE = 500
include_ds = ["mock1", "mock2", "mock3"]
hparams = {
"early_stopping_patience": 10,
"batch_size": 64,
"buffer_size": 10000,
"lr": 0.1,
"minibatch_size": 64,
"n_epochs": 20,
"scheduler_threshold": 1e-4,
"scheduler_patience": 10,
"device": "cuda",
"sgd": {},
}


class MockDataset(Dataset):
metadata_url = "https://iordanis.xyz/"
metadata_url = "https://iordanis.me/"
remote_urls = {"mock.tar": None}
name = "mock"
file_hash_map = {"mock.tar": "blahblah"}
Expand Down Expand Up @@ -60,9 +76,7 @@ def __init__(
kwargs["action"] = "process"
super().__init__(*args, **kwargs)
if mock_download:
self.make_features(500, "cuda","clip")

# ds.make_features(1024, DEVICE, clean=True, feature_extractor="clip")
self.make_features(MAKE_FEATS_BATCH_SIZE, DEVICE, "clip")

def _process(self, raw_data_dir: Path):
archive_path = raw_data_dir.joinpath("mock.tar")
Expand All @@ -87,35 +101,61 @@ def _make_metadata(self, raw_data_dir: Path):
torch.save(metadata, self.metadata_path)


class MockDataset2(MockDataset):
name = "mock2"
pass
datasets = []
for ds in include_ds:

class _MockClass(MockDataset):
pass

class MockDataset3(MockDataset):
name = "mock3"
pass
_MockClass.__name__ = ds.upper()
_MockClass.name = ds
datasets.append(_MockClass)

class MockDataset4(MockDataset):
name = "mock4"
pass
sizes = (np.arange(len(datasets)) + 1) * 100


def make_ds(self, task_id, train):

def test_benchmark(tmp_path: Path):
datasets = [MockDataset, MockDataset2, MockDataset3, MockDataset4]
sizes = (np.arange(len(datasets)) + 1) * 100
with mock.patch(
"stream.main.Stream.supported_datasets",
"autods.main.AutoDS.supported_datasets",
return_value=datasets,
):
for ds, size in zip(datasets, sizes):
ds(tmp_path, size=size, mock_download=True)

with mock.patch(
"stream.dataset.Dataset.assert_downloaded", return_value=True
), mock.patch("stream.dataset.Dataset.verify_downloaded", return_value=True):
train_method(tmp_path, "sgd", tmp_path, "clip")
breakpoint()
return

transform = None
if self.feats_name is None:
transform = self.transforms(train)

s = AutoDS(
self.root_path,
task_id=task_id,
feats_name=self.feats_name,
train=train,
transform=transform,
datasets=include_ds,
)
return s


def test_benchmark(tmp_path: Path):

hpp = tmp_path.joinpath("hparams.json")
hpp.write_text(json.dumps(hparams))

for ds, size in zip(datasets, sizes):
ds(tmp_path, size=size, mock_download=True)

with mock.patch(
"stream_benchmark.datasets.seq_stream.SequentialStream.make_ds", make_ds
), mock.patch(
"autods.dataset.Dataset.assert_downloaded", return_value=True
), mock.patch(
"autods.dataset.Dataset.verify_downloaded", return_value=True
):
train_method(
save_path=tmp_path, model_name="sgd", dataset_path=tmp_path, hparams=hpp
)
breakpoint()
return


if __name__ == "__main__":
Expand Down

0 comments on commit 9e513d9

Please sign in to comment.