An introduction to deep learning neural networks

News classification

Contact us

Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
Tel: 13146317170 廖经理
Fax:
Email: 398017534@qq.com

Single neuron structure

For a single neuron, if we give each input a weight, for example, there are three inputs (x1, X2, x3), and the corresponding weight is (W1, W2, W3), then the input is WX^T= (w1*x1+w2*x2+w3*x3) for output.
.
Now, set if the value is greater than the threshold threshold, then output 1; if the value is less than the threshold, the output is 0. With bias=-threshold, the above description can be simplified.

640? Wx_fmt=png&wxfrom=5&wx_lazy=1

Bias = - threshold, then:

640? Wx_fmt=png&wxfrom=5&wx_lazy=1

With the above settings, we can simulate the wrong problem, suppose you want to judge whether there is a dog in the picture, output=1 said, output=0 said no, we only need to look at a given weight (W) and bias (Bias) under the condition of the given input output can get the information we want.

To describe the more subtle changes, we change the range of output from 0 and 1 to [0,1], which can be done with the sigmoid function.

640? Wx_fmt=png

Neural network structure

The structure composed of multiple neurons is called the neural network.

On the terms of neural networks:

Forward propagation

Gradient descent method

Backpropagation

等等，可能看了很多资料也云里雾里搞不清楚它们是干什么的，这里希望我的讲述能够带您走出此迷宫！

As I said before, given the weight and bias on the corresponding input output, the input layer (input_1, input_2, input_3) in the hidden layer, three intermediate values (a_1, a_2, a_3), the three intermediate values to be applied to the output layer to get the final output value of output. This is a process of forward propagation.

That is, with the weight and bias, our model is established, and given the input, the calculation can be output.

So, the question is, if we have a heap of training data, how do we get the model? That is, how do we get the corresponding weights and biases in the model (as in the simple model above, we have 33+3+31+1=16 parameters)?

This is a process of parameter estimation.

Parting sentence: the model is determined by parameters, which often involve two problems: the model parameter is known, and the output is calculated by input. The parameter of the model is unknown, and the parameters are estimated by data and some mathematical methods.

We use the minimization of the output difference to complete the estimation of the parameters.

Objective function:

640? Wx_fmt=png

Assuming that a is the true value of the corresponding x, y (x) is the output value calculated through the network model. The smaller the difference between the two, the more reasonable the model parameters are, there is no need to speak more. In addition, the parameters of the target function are only W and B, and the goal of this line is to obtain the W and B that minimize the objective function.

The problem is generally solved by the gradient descent method.

Update equation of weight and bias:

640? Wx_fmt=png

If the partial derivative is known, the iteration will converge to the objective function.

So the problem now is how to get the partial derivative of so many parameters in the network model. This is the use of the legendary BP algorithm (also known as Back-Propagation, or backpropagation).

The purpose of the BP algorithm is to solve the problem shown in the following figure.

640? Wx_fmt=png

The core of BP is four formulas (the following of the deduction of the four formulas will be added to the reader circle):

640? Wx_fmt=png

Formula 1, the last layer of deviation delta^L obtained by Y and a^L (real value) is obtained.

In formula 2, the next layer is derived from the next layer, as follows:

640? Wx_fmt=png

From Formula 1 and formula 2, all the delta_j^l in the network can be obtained, and then all the partial derivatives can be obtained by combining formula 3 and formula 4, and then the gradient descent method is applied to iterate.

In this way, the parameter estimation algorithm flow is as follows:

Step1: initialization weight and bias (Weights and Bias);

Step2: input parameter x, set the activation a^{input} = x of the input layer;

Step3: forward update, for L1, L2..., L calculates the activation value (that is, the output of each neuron);

640? Wx_fmt=png

Step4: calculate the output layer error.

640? Wx_fmt=png

Step5: reverse update to get the error of each layer.

640? Wx_fmt=png

Step6: each partial derivative is obtained.

640? Wx_fmt=png

STEP7: get the updated weight and offset.

640? Wx_fmt=png

The updated weights and biases are recalculated until the target function converges, that is, the values of the two times are basically unchanged.

640? Wx_fmt=png

summary

The parameter determines the model. In a neural network, the weight and bias of the neural network model is determined. The two problem usually involves the known model parameters, the output and parameter estimation from input.

Forward propagation: the known model parameters (weight and bias) for the process of output under different input.

The core of BP is four formulas, the purpose of which is to calculate the partial derivative and to minimize the target function by using the gradient descent method.

PREVIOUS：Neural evolution is the future of deep l NEXT：Web terminal identity identification OCR