Analysis of convolution neural network

News classification

Contact us

Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
Tel: 13146317170 廖经理
Fax:
Email: 398017534@qq.com

In the process of combing the classic CNN model, I understand that many of the innovations in the evolution of the classic model are closely related to improving the complexity of model computing, so today let us briefly summarize the complexity of the convolution neural network.

1. time complexity

The time complexity of the whole 1.2 convolution neural network

Example: manually simply complete two-dimensional convolution with Numpy

Suppose Stride = 1, Padding = 0, IMG and kernel are np.ndarray..

2. space complexity
The spatial complexity is the number of parameters of the model, which represents the volume of the model itself.

It can be seen that the spatial complexity of the network is only related to the size of the convolution kernel K, the number of channels C, and the depth D of the network. It has nothing to do with the size of the input data.
When we need to cut the model, because the size of the convolution kernel is usually very small, and the depth of the network is closely related to the ability of the model, it is not appropriate to cut too much. Therefore, the center of the model cutting is usually the number of channels.

The influence of 3. complexity on the model
Time complexity determines the exercise / prediction time of the model. If complex exorbitant exorbitant exorbitant, model exercise and forecast consume a lot of time, neither fast textual criticism nor improvement model, nor fast prediction.
The spatial complexity of the model determines the number of parameters of the model. Because of the limit of dimension curse, the more parameters of the model, the greater the amount of data required for the exercise model, and the data set in the ideal life is usually not too large, which will lead to the model exercise more easily overfitting.

4. how do Inception series models optimize complexity?
Five small examples illustrate how to optimize the complexity of the evolution process of the model.

Convolution reduction of 1 x 1 in 4.1 InceptionV1
InceptionV1 has created the idea of Network in Network, which constructs four parallel and different dimensions convolution / pool modules (left) in a Inception Module, which effectively improves the width of the network. But this has also created a sharp increase in the time and space complexity of the network. The solution is to add 1 x 1 convolution (upper right red module) to reduce the number of input channels to a lower value, and then stop the real convolution.
Taking the (3b) module in InceptionV1 paper as an example, the input size is 28 * 28 x 256, 1 x 1 convolution kernel 128, 3 * 3 convolution kernel, 5 * 5 convolution kernel, and the convolution kernel uses Same Padding to ensure that the output does not change the size.

It can be seen that unlike the real convolution layer, the spatial complexity of the full join layer is intimately related to the size of the input data. Therefore, if the size of the input image is larger, the volume of the model will be larger. This is obviously not acceptable. For example, in the early VGG series, 90% of the parameters were consumed on the full link layer.
The global larger pool GAP used in InceptionV1 improves this problem. Since the characteristic graph of each convolution kernel can be directly refined into a scalar point after a large global pool, the complexity of the entire cohesive layer is no longer related to the size of the input image, and the amount of operation and parameters can be reduced in a wide range. The analysis of complexity is as follows:

4.3 InceptionV2 uses two 3 * 3 convolution cascades to replace 5 * 5 coil integral branches.

According to the relation formula of the two dimensional convolution input and output, we can see that as for the same input size, the output of a single 5 * 5 convolution is the same as that of the two 3 x 3 convolution cascades, that is, the same receptive field.
Similarly, based on the complexity analysis formula mentioned above, it can be seen that this exchange can effectively reduce the time and space complexity. We can use these complexity to save the depth and width of the model, so that our model can have greater capacity and crisp capacity on the premise of the constant complexity.
Similarly, take the (3b) module in InceptionV1 as an example, the complexity of the 5 * 5 volumes before and after swap is as follows:

4.4 InceptionV3 convolution cascade using N * 1 and 1 * N instead of N * N convolution

The convolution Factorization is proposed in InceptionV3, which is further simplified under the premise that the receptive field is unchanged.
The improvement of complexity can be obtained and no longer is described.

4.5 Depth-wise Separable Convolution is used in Xception.

What we discussed before is standard convolution, and each convolution kernel stops convolution of all the input channels.
The Xception model should fight this thought set, which allows each convolution kernel to serve only one of the input channels, which is called the Depth-wise Separable Convolution.
From the angle of input channel, every input channel in the standard convolution is trampled by all convolution kernel, and each input channel in Xception will only be scanned by a corresponding convolution kernel, which reduces the redundancy of the model.
The time complexity of normalized convolution and separated convolution is comparable: it can be seen that the multiplication is converted into additivity.

5. summary
Through the above deduction and the case analysis of the classic model, we can clearly see that many innovative points are the optimization of the complexity of the coiling model, and the fundamental logic is multiplying and adding. The model is optimized for fewer operations and fewer parameters. On the one hand, we can build a lighter and faster model (such as MobileNet), which, on the one hand, encourages us to build deeper and wider networks (such as Xception), to improve the capacity of the model, to defeat a variety of monster, and to the euro.

PREVIOUS：A brief introduction to face recognition NEXT：The first artificial intelligence AI und