Resnext的梳理

整理：张一极 2020·0326·22:08

援引论文的一句话：

In an Inception module, the input is split into a few lower-dimensional embeddings (by 1×1 convolutions), transformed by a set of specialized filters (3×3, 5×5, etc.), and merged by concatenation

在inception中，我们提到了堆叠网络块的组合，对于特征深度有很好的作用，遵循一个stm规则：

split-transform-merge

split指的是，一个特征图，经过多个分组的卷积通道，提取不同感受野下的特征，

transform指的是，多个尺寸的卷积，对于特征图进行的变换过程，

最后的merge指的是，分组卷积之后的合并过程。

如下图：

细节：

Multi-branch convolutional networks. The Inception models [38, 17, 39, 37] are successful multi-branch ar- chitectures where each branch is carefully customized. ResNets [14] can be thought of as two-branch networks where one branch is the identity mapping. Deep neural de- cision forests [22] are tree-patterned multi-branch networks with learned splitting functions.

多组通路，inception是多通道卷积，每一条支路都精心打磨，resnet是两条path，一条是主干，另一条支路，负责映射特征，Resnext的结构是多重通路，同一结构。

Grouped convolutions. The use of grouped convolutions dates back to the AlexNet paper [24], if not earlier. The motivation given by Krizhevsky et al. [24] is for distributing the model over two GPUs. Grouped convolutions are sup- ported by Caffe [19], Torch [3], and other libraries, mainly for compatibility of AlexNet. To the best of our knowledge, there has been little evidence on exploiting grouped convo- lutions to improve accuracy. A special case of grouped con- volutions is channel-wise convolutions in which the number of groups is equal to the number of channels. Channel-wise convolutions are part of the separable convolutions in [35].

网络结构：

原先的resnet是通过三层的卷积层，输出一个统一尺寸的特征，现在是32组卷积通道，最后叠加：

\mathbf{y}=\mathbf{x}+\sum_{i=1}^{C} \mathcal{T}_{i}(\mathbf{x})

a和b的区别在于，把4x32个卷积操作，合并成了一个128个卷积核的卷积操作，在数学上是相等的。

从参数方面来看分组卷积的优势：

inputchannel = 128

outputchannel = 128 参数量：128x128x3x3=147456

kernel_size = 3x3

Input channel = 64

Output_channel = 64

Kernel_size.= 3x3

Group = 2

参数量：64x64x3x3x2=73728

减少了一半的参数量，不可谓不精妙。

Code：


xxxxxxxxxx
1
layer_1 = 128
2
layer_2 = 128
3
layer_3 = 256
4
三个block的不同尺度下的输出通道数目
5
        self.split_transforms = nn.Sequential(
6
            nn.Conv2d(in_channels, 128, kernel_size=1, groups=C, bias=False),
7
            nn.BatchNorm2d(128),
8
            nn.ReLU(inplace=True),
9
            nn.Conv2d(128, 128, kernel_size=3, stride=stride, groups=C, padding=1, bias=False),
10
            nn.BatchNorm2d(128),
11
            nn.ReLU(inplace=True),
12
            nn.Conv2d(128, out_channels * 4, kernel_size=1, bias=False),
13
            nn.BatchNorm2d(out_channels * 4),
14
        )
15
        self.shortcut = nn.Sequential()#无需降采样的情况
16
        if stride != 1 or in_channels != out_channels * 4:
17
            self.shortcut = nn.Sequential(
18
                nn.Conv2d(in_channels, out_channels * 4, stride=stride, kernel_size=1, bias=False),
19
                nn.BatchNorm2d(out_channels * 4)
20
            )
21
    def forward(self, x):
22
        return F.relu(self.split_transforms(x) + self.shortcut(x))

网络的定义和resnet基本没什么区别：


xxxxxxxxxx
1
class ResNext(nn.Module):
2
3
    def __init__(self, block, num_blocks, class_names=100):
4
        super().__init__()
5
        self.in_channels = 64
6
7
        self.conv1 = nn.Sequential(
8
            nn.Conv2d(3, 64, 3, stride=1, padding=1, bias=False),
9
            nn.BatchNorm2d(64),
10
            nn.ReLU(inplace=True)
11
        )
12
13
        self.conv2 = self.make_layer(block, num_blocks[0], 64, 1)
14
        self.conv3 = self.make_layer(block, num_blocks[1], 128, 2)
15
        self.conv4 = self.make_layer(block, num_blocks[2], 256, 2)
16
        self.conv5 = self.make_layer(block, num_blocks[3], 512, 2)
17
        self.avg = nn.AdaptiveAvgPool2d((1, 1))
18
        self.fc = nn.Linear(512 * 4, 100)
19
    def make_layer(self, block, num_block, out_channels, stride):
20
        """Building resnext block
21
        Args:
22
            block: block type(default resnext bottleneck c)
23
            num_block: number of blocks per layer
24
            out_channels: output channels per block
25
            stride: block stride
26
        
27
        Returns:
28
            a resnext layer
29
        """
30
        strides = [stride] + [1] * (num_block - 1)
31
        layers = []
32
        for stride in strides:
33
            layers.append(block(self.in_channels, out_channels, stride))
34
            self.in_channels = out_channels * 4
35
36
        return nn.Sequential(*layers)
37
    def forward(self, x):
38
        x = self.conv1(x)
39
        x = self.conv2(x)
40
        x = self.conv3(x)
41
        x = self.conv4(x)
42
        x = self.conv5(x)
43
        x = self.avg(x)
44
        x = x.view(x.size(0), -1)
45
        logits = self.fc(x)
46
        probas = F.softmax(logits, dim=1)
47
        return logits,probas

Time elapsed: 186.09 min Epoch: 002/010 | Batch 0000/25000 | Cost: 1.2423 Epoch: 002/010 | Batch 0050/25000 | Cost: 1.6971 Epoch: 002/010 | Batch 0100/25000 | Cost: 1.5043 Epoch: 002/010 | Batch 0150/25000 | Cost: 3.7873 Epoch: 002/010 | Batch 0200/25000 | Cost: 0.9755 Epoch: 002/010 | Batch 0250/25000 | Cost: 0.8241 Epoch: 002/010 | Batch 0300/25000 | Cost: 0.3721 Epoch: 002/010 | Batch 0350/25000 | Cost: 1.3209 Epoch: 002/010 | Batch 0400/25000 | Cost: 1.5591 Epoch: 002/010 | Batch 0450/25000 | Cost: 0.4816 Epoch: 002/010 | Batch 0500/25000 | Cost: 1.1595 Epoch: 002/010 | Batch 0550/25000 | Cost: 1.0004 Epoch: 002/010 | Batch 0600/25000 | Cost: 2.2381 Epoch: 002/010 | Batch 0650/25000 | Cost: 1.0034 Epoch: 005/010 | Batch 13500/25000 | Cost: 0.2052 Epoch: 005/010 | Batch 13550/25000 | Cost: 2.2128 Epoch: 005/010 | Batch 13600/25000 | Cost: 0.0544 Epoch: 005/010 | Batch 13650/25000 | Cost: 0.1864 Epoch: 005/010 | Batch 13700/25000 | Cost: 1.7534

Resnext的梳理

细节：

实验going🧪