News classification
Contact us
- Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
- Tel: 13146317170 廖经理
- Fax:
- Email: 398017534@qq.com
Deep learning CNN detailed
Deep learning CNN detailed
CNN is convolutional Neutral Network, also known as convolutional neural network. Similar to recognized neurons, different neurons have different functions. There are mainly two types of neurons in convolutional neural networks, one is C and the other is S. C stands for convolution, that is, convolution, convolution operation is mainly used for feature extraction, S stands for subsampling, that is, downsampling, also called feature mapping, which is actually a pooling operation; thus the convolutional neural network is mainly composed of C layer and S layer. The C layer is a feature extraction layer. The input of each neuron is connected with the partial receptive field of the previous layer, and the features of the part are extracted. Once the part is extracted, the positional relationship between it and other features is also The S layer is a feature mapping layer. Each computing layer of the network is composed of multiple feature maps. Each feature is mapped to a plane, and the weights of all neurons on the plane are equal. At the same time, it also includes an activation function, which mainly stops the classification of features, and can activate the function or RELU function for sigmoid. The advantage of Sigmoid is that the output range is limited, so the data is not easy to diverge during transmission.
Convolution and downsampling in convolutional neural networks are done through windows. The convolutional layer is a 3*3 or 5*5 window, which is sequentially drawn through the original image to generate a series of feature maps. For the 2*2 window, you can stop the mean sampling or the maximum sampling. The following is LeNet. Focus on the convolutional neural network. I may pay more attention to the details that are not easily understood by others.
INPUT: Input image, 32*32 pixels;
C1: 6 5*5 convolution kernels, producing 6 28*28 feature maps;
S2: Stop downsampling with a 2*2 window to generate six 14*14 feature maps;
C3: Using 16 5*5 convolution kernels, each layer takes an indefinite number of S2 layer feature maps as input, and generates 16 layers of 10*10 feature maps; (the indefinite number refers to two possible Such as 1, 3, also 3, 1, 3, 5)
S4: Stop downsampling with a 2*2 window to generate 16 5*5 feature maps;
C5: Using 120 5*5 convolution kernels, each layer takes all 16 feature maps of S4 layer as input, and generates 120 1*1 feature maps; (all the feature maps are not required as input, if the feature map Too many, you can also select a part as input, so the C5 layer is not a full link layer)
F6: Using 84 1*1 convolution kernels, each layer takes all 120 feature maps of C5 layer as input, and produces 84 1*1 feature maps;
(This is the map that must be applied to the previous layer as input, so it is called the full connection layer, used to combine the previously extracted features, requiring a fixed-size input)
Here, when we use local or all convolutional feature maps as input, what is the detailed process? Taking the generation of C3 layer as an example, assuming that we now use the three feature maps of S2 as input, we can stop the selection, for example. Select feature maps 1, 3, 5; then now you take one of the 16 convolution kernels to stop convolution on feature maps 1, 3, 5, so you can get three results, in fact, three matrices And then assign weights to these matrices, that is, allocate the proportion of them, and participate in the bias to get a matrix, which is a characteristic map of C3;
Constantly mentioning the weights and offsets, here is a picture that I have summarized and a very clear link:
Because CNN is not targeted for the entire picture, it is not targeted, so the RCNN is born, that is, based on Region's CNN, its principle is mainly to select the possible region by selective search, which is the key information. The area stops cropping or stretching. The purpose of cropping and stretching here is mainly to generate a fixed-size image for the input of the full-link layer. For the convenience calculation, each area at the full-link layer requires a fixed-size image, but practice Cutting and stretching will result in the loss of picture information.
Convolution and downsampling in convolutional neural networks are done through windows. The convolutional layer is a 3*3 or 5*5 window, which is sequentially drawn through the original image to generate a series of feature maps. For the 2*2 window, you can stop the mean sampling or the maximum sampling. The following is LeNet. Focus on the convolutional neural network. I may pay more attention to the details that are not easily understood by others.
INPUT: Input image, 32*32 pixels;
C1: 6 5*5 convolution kernels, producing 6 28*28 feature maps;
S2: Stop downsampling with a 2*2 window to generate six 14*14 feature maps;
C3: Using 16 5*5 convolution kernels, each layer takes an indefinite number of S2 layer feature maps as input, and generates 16 layers of 10*10 feature maps; (the indefinite number refers to two possible Such as 1, 3, also 3, 1, 3, 5)
S4: Stop downsampling with a 2*2 window to generate 16 5*5 feature maps;
C5: Using 120 5*5 convolution kernels, each layer takes all 16 feature maps of S4 layer as input, and generates 120 1*1 feature maps; (all the feature maps are not required as input, if the feature map Too many, you can also select a part as input, so the C5 layer is not a full link layer)
F6: Using 84 1*1 convolution kernels, each layer takes all 120 feature maps of C5 layer as input, and produces 84 1*1 feature maps;
(This is the map that must be applied to the previous layer as input, so it is called the full connection layer, used to combine the previously extracted features, requiring a fixed-size input)
Here, when we use local or all convolutional feature maps as input, what is the detailed process? Taking the generation of C3 layer as an example, assuming that we now use the three feature maps of S2 as input, we can stop the selection, for example. Select feature maps 1, 3, 5; then now you take one of the 16 convolution kernels to stop convolution on feature maps 1, 3, 5, so you can get three results, in fact, three matrices And then assign weights to these matrices, that is, allocate the proportion of them, and participate in the bias to get a matrix, which is a characteristic map of C3;
Constantly mentioning the weights and offsets, here is a picture that I have summarized and a very clear link:
Because CNN is not targeted for the entire picture, it is not targeted, so the RCNN is born, that is, based on Region's CNN, its principle is mainly to select the possible region by selective search, which is the key information. The area stops cropping or stretching. The purpose of cropping and stretching here is mainly to generate a fixed-size image for the input of the full-link layer. For the convenience calculation, each area at the full-link layer requires a fixed-size image, but practice Cutting and stretching will result in the loss of picture information.