-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
11 changed files
with
1,344 additions
and
0 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
## Index | ||
|
||
<!-- TOC --> | ||
|
||
|
||
|
||
<!-- /TOC --> | ||
|
||
## 41. 请简述应当从哪些方向上思考和解决深度学习中出现的的over fitting问题? | ||
|
||
如果模型的训练效果不好,可先考察以下几个方面是否有可以优化的地方。 | ||
|
||
(1) 选择合适的损失函数(choosing proper loss ) | ||
神经网络的损失函数是非凸的,有多个局部最低点,目标是找到一个可用的最低点。非凸函数是凹凸不平的,但是不同的损失函数凹凸起伏的程度不同,例如下述的平方损失和交叉熵损失,后者起伏更大,且后者更容易找到一个可用的最低点,从而达到优化的目的。 | ||
|
||
- Square Error(平方损失) | ||
- Cross Entropy(交叉熵损失) | ||
|
||
(2) 选择合适的Mini-batch size | ||
采用合适的Mini-batch进行学习,使用Mini-batch的方法进行学习,一方面可以减少计算量,一方面有助于跳出局部最优点。因此要使用Mini-batch。更进一步,batch的选择非常重要,batch取太大会陷入局部最小值,batch取太小会抖动厉害,因此要选择一个合适的batch size。 | ||
|
||
(3) 选择合适的激活函数(New activation function) | ||
使用激活函数把卷积层输出结果做非线性映射,但是要选择合适的激活函数。 | ||
|
||
- Sigmoid函数是一个平滑函数,且具有连续性和可微性,它的最大优点就是非线性。但该函数的两端很缓,会带来猪队友的问题,易发生学不动的情况,产生梯度弥散。 | ||
- ReLU函数是如今设计神经网络时使用最广泛的激活函数,该函数为非线性映射,且简单,可缓解梯度弥散。 | ||
|
||
(4) 选择合适的自适应学习率(apdative learning rate) | ||
- 学习率过大,会抖动厉害,导致没有优化提升 | ||
- 学习率太小,下降太慢,训练会很慢 | ||
|
||
(5) 使用动量(Momentum) | ||
在梯度的基础上使用动量,有助于冲出局部最低点。 | ||
|
||
|
||
如果以上五部分都选对了,效果还不好,那就是产生过拟合了,可使如下方法来防止过拟合,分别是 | ||
|
||
- 1.早停法(earyly stoping)。早停法将数据分成训练集和验证集,训练集用来计算梯度、更新权重和阈值,验证集用来估计误差,若训练集误差降低但验证集误差升高,则停止训练,同时返回具有最小验证集误差的连接权和阈值。 | ||
|
||
- 2.权重衰减(Weight Decay)。到训练的后期,通过衰减因子使权重的梯度下降地越来越缓。 | ||
|
||
- 3.Dropout。Dropout是正则化的一种处理,以一定的概率关闭神经元的通路,阻止信息的传递。由于每次关闭的神经元不同,从而得到不同的网路模型,最终对这些模型进行融合。 | ||
|
||
- 4.调整网络结构(Network Structure)。 |
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
## Reference | ||
|
||
> [SIGAI历史文章汇总](https://mp.weixin.qq.com/s?__biz=MzU4MjQ3MDkwNA==&mid=100003090&idx=1&sn=f3316507413dc5fe1c4dc5f72d4a6ae1&chksm=7db690854ac1199353028fbcf3ddcb1290d0a9732005e496e166f816ca782ec687af6f94f4ce&mpshare=1&scene=1&srcid=1026HwvJauM7Jve4bvlijjvz#rd) | ||
Empty file.