-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
79 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
|
||
<h1 style="text-align:center"> Batch Model Consolidation: A Multi-Task Model Consolidation Framework </h1> | ||
|
||
<h3 style="text-align:center"> Iordanis Fostiropoulos Jiaye Zhu Laurent Itti</h3> | ||
<p style="text-align:center"> University of Southern California</p> | ||
|
||
<p style="text-align:center"> | ||
<a href="">[Arxiv]</a> | ||
| ||
<a href="https://github.com/fostiropoulos/stream_benchmark">[Code]</a> | ||
| ||
<a href="https://github.com/fostiropoulos/stream">[Dataset]</a> | ||
</p> | ||
|
||
## Abstract | ||
|
||
In Continual Learning (CL), a model is required to learn a stream of tasks sequentially | ||
without significant performance degradation on previously learned tasks. | ||
Current approaches fail for a long sequence of tasks from diverse domains and difficulties. | ||
Many of the existing CL approaches are difficult to apply in practice due to excessive memory | ||
cost or training time, or are tightly coupled to a single device. With the intuition | ||
derived from the widely applied mini-batch training, we propose Batch Model Consolidation | ||
(**BMC**) to support more realistic CL under conditions where multiple agents are | ||
exposed to a range of tasks. During a _regularization_ phase, BMC trains multiple | ||
_expert models_ in parallel on a set of disjoint tasks. Each expert maintains weight | ||
similarity to a _base model_ through a _stability loss_, and constructs a | ||
_buffer_ from a fraction of the task's data. During the _consolidation_ phase, | ||
combine the learned knowledge on `batches' of _expert models_ using a | ||
_batched consolidation loss_ in _memory_ data that aggregates all buffers. | ||
We thoroughly evaluate each component of our method in an ablation study and demonstrate | ||
the effectiveness on standardized benchmark datasets Split-CIFAR-100, Tiny-ImageNet, | ||
and the Stream dataset composed of 71 image classification tasks from diverse domains | ||
and difficulties. Our method outperforms the next best CL approach by 70% and is the | ||
only approach that can maintain performance at the end of 71 tasks. | ||
|
||
## Overview | ||
|
||
Intuition: similar to mini-batch training, batched task training can reduce the local minima and improve the convexity of the loss landscape. | ||
|
||
<p style="text-align:center"> | ||
<img src="https://drive.google.com/uc?export=view&id=1ZgwGy1Ta2u9Wim0D010uf7cSGw07qts9" alt="drawing" width="60%"/> | ||
</p> | ||
|
||
BMC optimizes multiple expert models from a single base model in parallel on **different** tasks, | ||
enforcing parameter-isolation. Experts are regularized during training to reduce the forgetting | ||
on tasks learned by base model. A new base model is consolidated by **batched distillation** from the experts. | ||
|
||
 | ||
|
||
BMC supports distributed training where experts are trained locally on remote devices. | ||
Artifacts are sent back to the central device for consolidation training. | ||
The parallelism of this framework enables BMC to learn long task sequences efficiently. | ||
|
||
 | ||
|
||
## The Stream Dataset | ||
|
||
Stream dataset implements the logic for processing and managing a large sequence of datasets, | ||
and provides a method to train on interdisciplinary tasks by projecting all datasets on the same dimension, | ||
by extracting features from pre-trained models. | ||
|
||
See [the repository](https://github.com/fostiropoulos/stream/tree/cvpr_release) for Stream dataset installation and usages. | ||
|
||
## Class-Incremental Learning | ||
|
||
We show on the Stream dataset with CLIP embedding that our method outperforms all other baselines in the Class-Incremental Learning scenario. | ||
Our implementation of BMC as well as the baselines can be found [here](https://github.com/fostiropoulos/stream_benchmark). | ||
|
||
 | ||
|
||
|
||
|
||
## Citation | ||
|
||
``` | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
name: "" | ||
title: null |