Communication-Efficient Distributed Deep Learning with Merged Gradient Sparsification on GPUs

## 一言でいうと
勾配スパース化、計算のパイプライニングで生じる通信と計算のトレードオフを最適化問題として定式化、さらに分散環境下で動作

### 論文リンク
https://www.comp.hkbu.edu.hk/~chxw/papers/infocom_2020_MGS.pdf

### 著者/所属機関
Shaohuai Shi†, Qiang Wang†, Xiaowen Chu†∗, Bo Li‡, Yang Qin§, Ruihao Liu¶, Xinxiao Zhao¶
†High-Performance Machine Learning Lab, Department of Computer Science, Hong Kong Baptist University
‡Department of Computer Science and Engineering, The Hong Kong University of Science and Technology
§Department of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)
¶MassGrid.com, Shenzhen District Block Technology Co., Ltd.

### 投稿日付(yyyy/MM/dd)
IEEE 2020/7/6-9

## 概要
TopK sparsificationやパイプライニングのような分散環境下でのモデルのそう学習時間を削減するための手法は、その手法自身が新たな計算や通信コストを生み出しており、それがボトルネックになりかねない
二つを組み合わせた手法は LAGS-SGD と呼ばれるが、この手法においてまずイテレーション時間を最小化するように層の連続数を決め、それを元にSGDを適応する。
![image](https://user-images.githubusercontent.com/47178807/120671287-336bb000-c4cc-11eb-8767-6dd19ffbd56a.png)


## 新規性・差分
Topk sparsificationとパイプライニングしつつ、層のマージの最適化を行った点。
![image](https://user-images.githubusercontent.com/47178807/120670348-4af66900-c4cb-11eb-8854-905732858852.png)


## 手法
勾配の通信は、com(a+b) < com(a) + com(b) という関係が成り立つ(同時に送った方が良い).
しかし、スパース化のコストはs(a+b) > s(a) + s(b) である(いっぱいあるとその分大変)。
そのため、L層のうちいくつかの連続する層をまとめることを考える。
![image](https://user-images.githubusercontent.com/47178807/120669595-8e040c80-c4ca-11eb-8348-05d840cfa366.png)
![image](https://user-images.githubusercontent.com/47178807/120669606-90fefd00-c4ca-11eb-95b5-5d062ab43673.png)
Mはマージor非マージの全ての組み合わせ、tsがスパース化時間、tcが通信時間、τが累計時間(max内はどっちか遅い方)


## 結果
以下の4つのモデルとデータセットで実験
- VGG16 - cifar10
- resnet50 - Imagenet
-  Inceptionv4 - Imagenet
- 2 layer LSTM - PTB

最終的な収束はvanila SGDと同じくらい
![image](https://user-images.githubusercontent.com/47178807/120672307-24393200-c4cd-11eb-87c5-a1f3b3dc8cfc.png)

vanila SGDと提案手法で、イテレーション内でどこに時間がかかっているかの内訳
ある程度パラメータ数が大きいモデルでは効果的に通信コストを減らせている？
![image](https://user-images.githubusercontent.com/47178807/120672717-82feab80-c4cd-11eb-9cd2-8f23068ec9dc.png)


## コメント


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Communication-Efficient Distributed Deep Learning with Merged Gradient Sparsification on GPUs #13

一言でいうと

論文リンク

著者/所属機関

投稿日付(yyyy/MM/dd)

概要

新規性・差分

手法

結果

コメント

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Communication-Efficient Distributed Deep Learning with Merged Gradient Sparsification on GPUs #13

Description

一言でいうと

論文リンク

著者/所属機関

投稿日付(yyyy/MM/dd)

概要

新規性・差分

手法

結果

コメント

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions