News classification
Contact us
- Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
- Tel: 13146317170 廖经理
- Fax:
- Email: 398017534@qq.com
Application of artificial intelligence deep learning in Machine Translation
Application of artificial intelligence deep learning in Machine Translation
This article only gives a brief introduction to the related applications and does not involve formula derivation (this article is a part of the picture from the network).
.
1. development of Machine Translation
640? Wx_fmt=jpeg
Before the 80s of last century, Machine Translation mainly depended on the development of linguistics and analyzed syntax, semantics and pragmatics.
After that, the researchers began to apply the statistical model to Machine Translation, which is based on the analysis of the existing text corpus to generate translation results.
Since 2012, with the rise of deep learning, the neural network has been applied to Machine Translation and has made great achievements in just a few years.
2. neural network Machine Translation (Neural Machine Translation)
In 2013, Nal Kalchbrenner and Phil Blunsom proposed a new end - to - end encoder - decoder structure for Machine Translation. In 2014, Sutskever developed a named sequence to sequence (seq2seq) learning method, Google based on the model learning framework tensorflow in the depth of tutorial implementation, and achieved very good results (see https://www.tensorflow.org/tutorials/ seq2seq).
2.1 (pre) fast speed introduction of neural network
Deep learning (the name is very tall) refers to the multi-layer neural network. The picture above.
640? Wx_fmt=png
This is a single layer neural network, and the multilayer neural network is to insert a number of hidden layers in the middle, and each hidden layer has several nodes. But there is only one layer in the input and output layer.
Traditional programming is given to input, to determine each step, and to get the output at the end. The way of neural network is to give known multi group input and output, called training samples, and the steps to be done (i.e., the models) are unknown. How do we determine the steps? "Regression / fitting", use the simplest equation model to compare. Directly!
640? Wx_fmt=jpeg
The training process of neural network is similar to this, and it is also a number of coefficients in the hidden layer by training. But the neural network model itself is nonlinear and complex. Such nouns as feedforward, error back propagation and gradient descent are all used in the training process.
2.2 basic seq2seq model
640? Wx_fmt=jpeg
The Seq2Seq model by Encoder, Decoder and the connection between the intermediate state vector is composed of three parts, through the study of Encoder input, encoding it into a fixed size state vector C, then C will be sent to Decoder, Decoder and the state vector C learning to output.
2.2.1 RNN and LSTM
Encoder, Decoder codec usually adopts the variant of recurrent neural network Recurrent Neural Network, which is long and short time memory neural network (Long Short-term Memory, LSTM). The difference between LSTM and ordinary RNN is that it has a good effect on long distance state storage. See the following picture.
640? Wx_fmt=jpeg
(a) ordinary RNN
640? Wx_fmt=png
(b) LSTM
Multilayer neural network (DNN) common hidden state information (H is the output form of the hidden layer nodes of an independent).
The hidden layer state information (HT) of RNN at the present time is affected by the hidden layer information HT-1 from the previous moment, that is, the RNN can save the previous part of the memory. For Machine Translation, for example, input "My coat is white, hers is blue", using RNN model to translate the second half sentence, the first half of "coat" provides certain information. But this memory will be greatly weakened as the interval increases. The specific principle is not explained in detail here.
LSTM uses the adder (gating idea) in the cell of each hidden layer to achieve selective memory. Similar to our memory for childhood, it is also selective to remember, which greatly avoids the problems arising from the use of RNN. The translation "My coat is white, hers is blue", translated to "hers", and the previous "My coat" information was kept by the gate control of the adder.
640? Wx_fmt=jpeg
2.1.2 Encoder-Decoder model
640? Wx_fmt=jpeg
The above image is the basic structure of the basic seq2seq model in Machine Translation. We can see that the Encoder encoder accepts input (for example: I am a student), and gets the state information C through the transmission between sequence states. Then the C is input into the decoder to get the translated output.
This model is applied to a problem in Machine Translation, which is that there is only one information C accepted in the decoder. If translation is "I am a student", when translating to "student", it doesn't need to pay attention to "I am" before, and if translation sentence is very long, C is a limited quantity, so it is difficult to keep all information. So we want Encoder to focus on Decoder. It is similar to the following figure.
640? Wx_fmt=jpeg
The Decoder end can receive different state information at different times of the translation sequence. This is the Attention mechanism.
2.2 Attention mechanism
The Tensorflow framework of Google uses the attention mechanism proposed by Luong in 2015, and the Ci in the figure above can be represented as the weighted sum of each hi in Encoder. The determination of the weight parameter wi can also be used to train a small neural network. The introduction of the Attention mechanism has greatly improved the accuracy of Machine Translation.
A contest between 3.Facebook and Google
In May 2017, Facebook first convolutional neural network (CNN, now popular in computer vision, it is behind a pile of principle formula...) for Machine Translation, the use of CNN can be parallelized, and then set on the advantages of a pile of RNN, model (named Fairseq) training speed (up to 9 times), and the accuracy of translation
.
1. development of Machine Translation
640? Wx_fmt=jpeg
Before the 80s of last century, Machine Translation mainly depended on the development of linguistics and analyzed syntax, semantics and pragmatics.
After that, the researchers began to apply the statistical model to Machine Translation, which is based on the analysis of the existing text corpus to generate translation results.
Since 2012, with the rise of deep learning, the neural network has been applied to Machine Translation and has made great achievements in just a few years.
2. neural network Machine Translation (Neural Machine Translation)
In 2013, Nal Kalchbrenner and Phil Blunsom proposed a new end - to - end encoder - decoder structure for Machine Translation. In 2014, Sutskever developed a named sequence to sequence (seq2seq) learning method, Google based on the model learning framework tensorflow in the depth of tutorial implementation, and achieved very good results (see https://www.tensorflow.org/tutorials/ seq2seq).
2.1 (pre) fast speed introduction of neural network
Deep learning (the name is very tall) refers to the multi-layer neural network. The picture above.
640? Wx_fmt=png
This is a single layer neural network, and the multilayer neural network is to insert a number of hidden layers in the middle, and each hidden layer has several nodes. But there is only one layer in the input and output layer.
Traditional programming is given to input, to determine each step, and to get the output at the end. The way of neural network is to give known multi group input and output, called training samples, and the steps to be done (i.e., the models) are unknown. How do we determine the steps? "Regression / fitting", use the simplest equation model to compare. Directly!
640? Wx_fmt=jpeg
The training process of neural network is similar to this, and it is also a number of coefficients in the hidden layer by training. But the neural network model itself is nonlinear and complex. Such nouns as feedforward, error back propagation and gradient descent are all used in the training process.
2.2 basic seq2seq model
640? Wx_fmt=jpeg
The Seq2Seq model by Encoder, Decoder and the connection between the intermediate state vector is composed of three parts, through the study of Encoder input, encoding it into a fixed size state vector C, then C will be sent to Decoder, Decoder and the state vector C learning to output.
2.2.1 RNN and LSTM
Encoder, Decoder codec usually adopts the variant of recurrent neural network Recurrent Neural Network, which is long and short time memory neural network (Long Short-term Memory, LSTM). The difference between LSTM and ordinary RNN is that it has a good effect on long distance state storage. See the following picture.
640? Wx_fmt=jpeg
(a) ordinary RNN
640? Wx_fmt=png
(b) LSTM
Multilayer neural network (DNN) common hidden state information (H is the output form of the hidden layer nodes of an independent).
The hidden layer state information (HT) of RNN at the present time is affected by the hidden layer information HT-1 from the previous moment, that is, the RNN can save the previous part of the memory. For Machine Translation, for example, input "My coat is white, hers is blue", using RNN model to translate the second half sentence, the first half of "coat" provides certain information. But this memory will be greatly weakened as the interval increases. The specific principle is not explained in detail here.
LSTM uses the adder (gating idea) in the cell of each hidden layer to achieve selective memory. Similar to our memory for childhood, it is also selective to remember, which greatly avoids the problems arising from the use of RNN. The translation "My coat is white, hers is blue", translated to "hers", and the previous "My coat" information was kept by the gate control of the adder.
640? Wx_fmt=jpeg
2.1.2 Encoder-Decoder model
640? Wx_fmt=jpeg
The above image is the basic structure of the basic seq2seq model in Machine Translation. We can see that the Encoder encoder accepts input (for example: I am a student), and gets the state information C through the transmission between sequence states. Then the C is input into the decoder to get the translated output.
This model is applied to a problem in Machine Translation, which is that there is only one information C accepted in the decoder. If translation is "I am a student", when translating to "student", it doesn't need to pay attention to "I am" before, and if translation sentence is very long, C is a limited quantity, so it is difficult to keep all information. So we want Encoder to focus on Decoder. It is similar to the following figure.
640? Wx_fmt=jpeg
The Decoder end can receive different state information at different times of the translation sequence. This is the Attention mechanism.
2.2 Attention mechanism
The Tensorflow framework of Google uses the attention mechanism proposed by Luong in 2015, and the Ci in the figure above can be represented as the weighted sum of each hi in Encoder. The determination of the weight parameter wi can also be used to train a small neural network. The introduction of the Attention mechanism has greatly improved the accuracy of Machine Translation.
A contest between 3.Facebook and Google
In May 2017, Facebook first convolutional neural network (CNN, now popular in computer vision, it is behind a pile of principle formula...) for Machine Translation, the use of CNN can be parallelized, and then set on the advantages of a pile of RNN, model (named Fairseq) training speed (up to 9 times), and the accuracy of translation