Convolution neural network CNN and common framework for depth learning

News classification

Contact us

Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
Tel: 13146317170 廖经理
Fax:
Email: 398017534@qq.com

Why is the neural network better than the traditional classifier?

1. the traditional classifier is LR (logistic regression) or linear SVM, which is used to do linear segmentation. If all the samples can be seen as a point, the following graph, with blue points and green points, the traditional classifier is to find a straight line to separate the two classes of samples.

For nonlinear separable samples, some kernel kernel functions or characteristic maps can be added to make it a curve or a surface to separate the samples. But why the effect is not good, the main reason is that it is difficult to ensure that the distribution of the sample points will be as regular as the map shows, we can not control its distribution, when the green point is mixed with several blue dots, it is difficult to separate, the curve can be separated in time, the curve will become very distorted, not only difficult to learn, but also difficult to learn, but also difficult to learn, but also difficult to learn, but also difficult to learn, but also There will be a problem of fitting. And the extraction of features as separate two categories of samples is also a problem. This is also the reason why it is not used now.

2. how does the neural network do it?

The neural network is actually using AND and OR operations to extract the piece from the sample point. As the following figure, every change in the top green area can be seen as a linear classifier, which is divided into a positive example and a negative example. The classifier does the AND operation, and the result is a green area, and then multiple green areas are used by OR. A neuron can implement AND operation or OR operation. We only need to provide samples, and the neural network can learn it by ourselves. This is the advantage of it. The conclusion is that the combination of the "and" and "/" of the linear classifier perfectly classify the distribution of the plane sample points.

Two. What is a convolution neural network

Convolution neural network is still hierarchical network, but the function and form of the layer have changed. Hierarchical structure can be referred to below

1. level structure

Its hierarchical structure includes: data input layer / Input layer, convolution computing layer / CONV layer, ReLU incentive layer / ReLU layer, pool layer / Pooling layer, all connection layer / FC layer

(1) data input layer / Input layer

There are 3 common methods of image data processing

To mean: centralization of all dimensions of input data to 0, that is, calculate the average of all samples, and let all samples subtract this mean.

Normalization: the amplitude is normalized to the same range, for example, compressing the sample data to 0-1.

PCA/ whitening: dimension reduction using PCA, whitening is normalization of the magnitude of each characteristic axis of the data.

The processing of CNN in the image is often only the mean value.

(2) convolution computing layer / CONV layer

It is no longer fully connected, but a local association. Each neuron is seen as a filter. Through sliding operations on windows (receptive field), filter calculates local data.

There are 3 other concepts here:

The depth /depth, in this picture, refers to 3, usually the rgb3 color channel of the picture.

Step length /stride: that is how far the window slid each time

Fill value /zero-padding: in order to slide the sliding window to the boundary exactly, we need to fill 0 and padding in the surrounding area and fill in a few circles.

Here is a concrete example of convolution:

The depth of this example is 2, because only two filter, each color channel has a 3*3 sliding window, the value in the window corresponds to the W in the filter, and each channel gets a value, and the 3 values are added to the Output Volume layer value, Filter W0 has convoluted input input and gets ou. The first matrix of tput volume layer, filter W2 gets second.

In addition, one of the most important features of the convolution layer is the parameter sharing mechanism, that is, the weight of each neuron connection data window is fixed, so that the parameter sharing mechanism can be understood.

Fixing the weights of each neuron can be regarded as a template, and each neuron is concerned only with one feature.

The advantage is that the number of weights to be estimated is reduced, for example, the AlexNet network is reduced from 100 million parameters that need to be adjusted to 3.5W.

(3) the incentive layer (ReLU)

Nonlinear mapping of the results of the coiling layer output

The common excitation functions are Sigmoid, Tanh (hyperbolic tangent), ReLU, Leaky ReLU, ELU, Maxout.

Sigmoid: the first thing to use is now basically unused, because when the X is larger, its output value is close to 1, its gradient is close to 0, and we want to use the gradient to optimize it, which will lead to an unfinished weight optimization.

ReLU: an incentive function commonly used in comparison, it has the characteristics of fast convergence, simple gradient and more fragile. The more fragile reason is that when the value of X is less than 0, it will have the result of a gradient of 0.

Leaky ReLU

No "saturation" / hang up, and the calculation is very fast.

Exponent linear unit ELU: all ReLU have some advantages, they will not hang up, and the output mean will tend to be 0.

Maxout: the calculation is linear, it will not be saturated, and it will not hang up. There are many parameters, two straight lines splicing.

Practical experience:

1) do not use sigmoid

2) first try RELU, because it's fast, but be careful

3) if 2 fails, please use Leaky ReLU or Maxout

4) in some cases, tanh has good results, but few.

(4) pool layer / Pooling layer

Its position is usually sandwiched between successive coiling layers, which is to compress data and parameters and reduce over fitting.

(5) all connected layer / FC layer

All neurons are weighted together between the two layers, usually the full connection layer is in volume.

PREVIOUS：Two generation identity card scanning OC NEXT：he learning of the introductory machine