News classification
Contact us
- Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
- Tel: 13146317170 廖经理
- Fax:
- Email: 398017534@qq.com
Summary of deep learning knowledge
Summary of deep learning knowledge
Summarize some common learning points in deep learning. Before reading this article, you are at least familiar with deep learning and machine learning, and even the tricks of probability and statistics.
So lets begin
convolutional neural network
Haha, you may be mixed up by the convolution formula in Fu Liye's transformation, and the convolution is not related to that fart.
In fact, it is a matrix called convolution kernel (Kernel). The real result of the kernel and the product it wants to identify is very large. In some level, it is not a strict filter.
The figure below is the visualization of the convolution kernel.
Here's a picture depiction
Pool
You can put it up and convolution parallel understanding, though different, but their relationship is like Na Zha and Haier (a similar...... )
The pool layer is to stop the compression of the input feature graph, on the one hand, it makes the feature map smaller and simplifies the computation complexity of the network. On the one hand, it stops the feature compression and extracts the main features. Her assumption is that the pixels in the adjacent position in the image are related.
Here's a picture depiction
LSTM
Neural network is originally a universal fitting. Now we need to deal with a time series model, for example, the stock price data is a high-density time series data, and obviously it is related to the next hour and the natural language.
So we have to say RNN:Recurrent Neural Networks first
Here's a picture depiction
The network AtAt that can see the TT hour has two inputs:
Ht=At (At - 1, XT)
Ht=At (At - 1, XT)
But RNN is not very good at learning in the form of long-term dependence, mainly because the large gradient and high complexity of long chain learning will not be ideal.
Long Short Term Network - commonly called LSTM - is a special type of RNN that can learn long - term dependency information. LSTM is proposed by Hochreiter & Schmidhuber (1997) and has been improved and implemented by Alex Graves in the near future. On a lot of problems, LSTM won a great victory and was widely used.
Here's a picture depiction
So LSTM is actually a few nonlinear doors, each of which is actually a small neural network, and such a A is called a LSTM unit.
Here's a picture depiction
Reverse propagation algorithm of LSTM
Here's a picture depiction
VAE
I'm not really talking about Xu Song. What we are talking about is ariational Auto-Encoder, which is variational from encoder. In fact, the encoder is to generate the sample through the encoder to become a feature vector, the decoder is to decode the eigenvector to the original sample. This is similar to encryption and decryption in cryptography.
Both the encoder and the decoder are a neural network.
Here's a picture depiction
Note that, since the variational encoder thought is the basis of GAN in the limelight, when we consider, from the time of sample XX ZZ encoding, we also exercise the decoder generated by the x^x^ ZZ, ||x^ - x||||x^ - x||. minimization finally we remove the encoder, ZZ had the practical expression of meaning! Its changes, coupled with the decoder to work, can generate false sample x^x^.
When the ZZ is changed, the direction of the digital sample is also changed, and it is stated that ZZ contains this feature:
Here's a picture depiction
Can you wonder why it is called the variation?
Our goal is P theta (x) P theta (x), and the following:
LogP theta (x) =Ez[logP theta (x|z) P theta (z) /P theta (z|x)]
LogP theta (x) =Ez[logP theta (x|z) P theta (z) /P theta (z|x)]
And P theta (z|x) P theta (z|x) is hard to calculate, we use it
Q (z|x)
Q (z|x)
Instead of calculating, we don't talk in detail here. Both Q (z|x) Q (z|x) or P (x|z) P theta theta (x|z) is a neural network.
GAN
The antagonistic neural network is famous. It doesn't need to speak much about its principle. It is a generator (encoder) and a discriminator. It is noted that its exercise is alternator optimization between generator and discriminator (alternating gradient update).
Here's a picture depiction
There is a beginning to a generator (1 log - D (G (z)) log (1) - D (G (z))) the gradient descent is very slow, so the beginning of - log (D (G (z)) and log (D) (G (z))) as gradient update imminent G (z) G (z) close to 1 of the central.
Here's a picture depiction
Reinforcement learning
Reinforcement learning absorbs the idea of the Markov chain. In fact, if you are familiar with deep learning, you will find that many models have more or less relationship with the Markov chain. The learning model has become a general agent, and the environment in a way.
Here's a picture depiction
The environment gives the state value of the agent and the reward value, and the agent gives the value of the environmental action. I think the agent is like a game player, and the environment is like a game machine. It seems reasonable to know that, in fact, the reinforcement of the learning model is really used to exercise games, and AlphaGo.
I'll talk directly to the final way:
At 0 J (0) t = 0 (Q Sigma Theta PI (st, at) - V (st) PI theta theta theta PI) at log (at|st)
At 0 J (0) t = 0 (Q Sigma Theta PI (st, at) - V (st) PI theta theta theta PI) at log (at|st)
Where Q PI theta (st, at) Q PI theta (st, at) depicts the quantization of (st, at) (st, at), that is, how well the action is taken when the state is STSt. Torgovnik
So lets begin
convolutional neural network
Haha, you may be mixed up by the convolution formula in Fu Liye's transformation, and the convolution is not related to that fart.
In fact, it is a matrix called convolution kernel (Kernel). The real result of the kernel and the product it wants to identify is very large. In some level, it is not a strict filter.
The figure below is the visualization of the convolution kernel.
Here's a picture depiction
Pool
You can put it up and convolution parallel understanding, though different, but their relationship is like Na Zha and Haier (a similar...... )
The pool layer is to stop the compression of the input feature graph, on the one hand, it makes the feature map smaller and simplifies the computation complexity of the network. On the one hand, it stops the feature compression and extracts the main features. Her assumption is that the pixels in the adjacent position in the image are related.
Here's a picture depiction
LSTM
Neural network is originally a universal fitting. Now we need to deal with a time series model, for example, the stock price data is a high-density time series data, and obviously it is related to the next hour and the natural language.
So we have to say RNN:Recurrent Neural Networks first
Here's a picture depiction
The network AtAt that can see the TT hour has two inputs:
Ht=At (At - 1, XT)
Ht=At (At - 1, XT)
But RNN is not very good at learning in the form of long-term dependence, mainly because the large gradient and high complexity of long chain learning will not be ideal.
Long Short Term Network - commonly called LSTM - is a special type of RNN that can learn long - term dependency information. LSTM is proposed by Hochreiter & Schmidhuber (1997) and has been improved and implemented by Alex Graves in the near future. On a lot of problems, LSTM won a great victory and was widely used.
Here's a picture depiction
So LSTM is actually a few nonlinear doors, each of which is actually a small neural network, and such a A is called a LSTM unit.
Here's a picture depiction
Reverse propagation algorithm of LSTM
Here's a picture depiction
VAE
I'm not really talking about Xu Song. What we are talking about is ariational Auto-Encoder, which is variational from encoder. In fact, the encoder is to generate the sample through the encoder to become a feature vector, the decoder is to decode the eigenvector to the original sample. This is similar to encryption and decryption in cryptography.
Both the encoder and the decoder are a neural network.
Here's a picture depiction
Note that, since the variational encoder thought is the basis of GAN in the limelight, when we consider, from the time of sample XX ZZ encoding, we also exercise the decoder generated by the x^x^ ZZ, ||x^ - x||||x^ - x||. minimization finally we remove the encoder, ZZ had the practical expression of meaning! Its changes, coupled with the decoder to work, can generate false sample x^x^.
When the ZZ is changed, the direction of the digital sample is also changed, and it is stated that ZZ contains this feature:
Here's a picture depiction
Can you wonder why it is called the variation?
Our goal is P theta (x) P theta (x), and the following:
LogP theta (x) =Ez[logP theta (x|z) P theta (z) /P theta (z|x)]
LogP theta (x) =Ez[logP theta (x|z) P theta (z) /P theta (z|x)]
And P theta (z|x) P theta (z|x) is hard to calculate, we use it
Q (z|x)
Q (z|x)
Instead of calculating, we don't talk in detail here. Both Q (z|x) Q (z|x) or P (x|z) P theta theta (x|z) is a neural network.
GAN
The antagonistic neural network is famous. It doesn't need to speak much about its principle. It is a generator (encoder) and a discriminator. It is noted that its exercise is alternator optimization between generator and discriminator (alternating gradient update).
Here's a picture depiction
There is a beginning to a generator (1 log - D (G (z)) log (1) - D (G (z))) the gradient descent is very slow, so the beginning of - log (D (G (z)) and log (D) (G (z))) as gradient update imminent G (z) G (z) close to 1 of the central.
Here's a picture depiction
Reinforcement learning
Reinforcement learning absorbs the idea of the Markov chain. In fact, if you are familiar with deep learning, you will find that many models have more or less relationship with the Markov chain. The learning model has become a general agent, and the environment in a way.
Here's a picture depiction
The environment gives the state value of the agent and the reward value, and the agent gives the value of the environmental action. I think the agent is like a game player, and the environment is like a game machine. It seems reasonable to know that, in fact, the reinforcement of the learning model is really used to exercise games, and AlphaGo.
I'll talk directly to the final way:
At 0 J (0) t = 0 (Q Sigma Theta PI (st, at) - V (st) PI theta theta theta PI) at log (at|st)
At 0 J (0) t = 0 (Q Sigma Theta PI (st, at) - V (st) PI theta theta theta PI) at log (at|st)
Where Q PI theta (st, at) Q PI theta (st, at) depicts the quantization of (st, at) (st, at), that is, how well the action is taken when the state is STSt. Torgovnik