## Efficientnet解读

Author:张一极

efficientnet致力于寻找一个方法,去平衡flops和一些sota结果的网络架构之间的关系,首先给出了影响网络效率的几个参数,图像的尺寸,网络深度,网络的广度,面临的问题就是过于深入的网络,会造成梯度弥散等问题,精度不增反降,引出了resnet的发明,而后成为了imagenet的冠军之作,在这里:

ResNet (He et al., 2016) can be scaled up from ResNet-18 to ResNet-200 by using more layers; Recently, GPipe (Huang et al., 2018) achieved 84.3% Ima- geNet top-1 accuracy by scaling up a baseline model

In particular, EfficientNet-B7 achieves new state-of-the-art 84.4% top-1 accuracy but being 8.4x smaller and 6.1x faster than GPipe. EfficientNet-B1 is 7.6x smaller and 5.7x faster than ResNet-152.

(a) is a baseline network example; (b)-(d) are conventional scaling that only increases one dimension of network width, depth, or resolution. (e) is our proposed compound scaling method that uniformly scales all three dimensions with a fixed ratio.

a是基准网络

b-d是不同尺度的拉伸后的网络

e是通过自适应尺度拉伸的网络

then we can simply increase the network depth by αN, width by βN, and image size by γN, where α,β,γ are constant coefficients determined by a small grid search on the original small model.

$\mathcal{N}=\bigoplus_{i=1 \ldots s} \mathcal{F}_{i}^{L_{i}}\left(X_{\left\langle H_{i}, W_{i}, C_{i}\right\rangle}\right)$

ResNet-1000 has similar accuracy as ResNet-101 even though it has much more layers. Figure 3 (middle) shows our empirical study on scaling a baseline model with different depth coefficient d, further suggesting the diminishing accuracy return for very deep ConvNets.

If we only scale network width w without changing depth (d=1.0) and resolution (r=1.0), the accuracy saturates quickly. With deeper (d=2.0) and higher resolution (r=2.0), width scaling achieves much better accuracy under the same FLOPS cost. These results lead us to the second observation:

In fact, a few prior work (Zoph et al., 2018; Real et al., 2019) have already tried to arbitrarily balance network width and depth, but they all require tedious manual tuning.

\begin{aligned} \text { depth: } d &=\alpha^{\phi} \\ \text { width: } w &=\beta^{\phi} \\ \text { resolution: } & r=\gamma^{\phi} \\ \text { s.t. } \alpha & \cdot \beta^{2} \cdot \gamma^{2} \approx 2 \\ \alpha & \geq 1, \beta \geq 1, \gamma \geq 1 \end{aligned}

${\phi}$是一个具体的系数,来规范模型资源用量的值,在本paper中:

In this paper, we constraint α · β2 · γ2 ≈ 2 such that for any new φ, the total FLOPS will approximately3 increase by 2φ.

ACC(m)×[FLOPS(m)/T]w as the optimization goal, where ACC(m) and FLOPS(m) denote the accu- racy and FLOPS of model m

All EfficientNet models are scaled from our baseline EfficientNet-B0 using different compound coefficient φ in Equation 3. ConvNets with similar top-1/top-5 accuracy are grouped together for efficiency comparison. Our scaled EfficientNet models consistently reduce parameters and FLOPS by an order of magnitude (up to 8.4x parameter reduction and up to 16x FLOPS reduction) than existing ConvNets.

https://arxiv.org/pdf/1905.11946.pdf