News classification
Contact us
- Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
- Tel: 13146317170 廖经理
- Fax:
- Email: 398017534@qq.com
Application of deep learning in OCR text classification
Application of deep learning in OCR text classification
Introduction
This text classification is a very common and widely used subject in the NLP category, and there was quite a lot of research results, SVM classification rules based on the characteristics such as very common application, plus Naive Bayesian to SVM classifier, and of course based on CRFs to construct the dependency tree of maximum entropy classification is the way, of course, the classification of common BP neural network classification method. In the traditional text classification bag model, when text is transformed into text vector, text vector dimension often comes through large problems, and of course, there are other ways to reduce dimensions. However, these measures, because in the process of exercise, loss of word order information in the text classification process, the effect is not necessarily satisfactory. This article is mainly learning several deep learning text classification research papers [1,2,3,4,5] and Bowen [6].
background
Deep learning content design in this blog mainly refers to RNN and CNN, while in the [1,2,3,4,5] in fact mainly involves the use of CNN modeling and text classification, the paper [3] mentioned by RNN exercise text vector approach, so in order to depict the simplicity, I used to deep learning said the classification method used in this article.
The reason why CNN can be widely used in text categorization, the main reason is very simple, because the CNN and N-gram model in CNN filter window, in fact can be viewed as the N-gram approach, but because of the use of CNN layer and pooling layer volume, so that CNN can we reduce the number of training parameters, at the same time can also be higher to text information extraction. And RNN is more used in text modeling and on Machine Translation, and it seems to be not so much in text categorization.
CNN for text categorization
It is intended to use the paper [1,2] to illustrate the application of CNN in text classification. The date of the two articles was very close, all of which were in 2014.
First, take a look at the details of the paper [1] (Convolutional Neural Network for Sentence Classification)
Look at the author's CNN Construction:
CNN structure diagram
Explain the above picture.
There are two channel in the output layer on the left, each channel is a two-dimensional matrix, matrix column length equal to the length of the sentence statement (that is the number of words, sentence through padding to be made for each sentence classification has the same length), vector matrix vector representation method of each word in this paper, the author uses the word2vec tool that is initialized, each word do embedding. Two channel is the same as in the initialization time, the use of two channel is the two channel purpose is not the same, one of which is static, is also given after embedding, its value is not changed, another channel non-static, said the embedding vector is a parameter, but also demand find out in the derivation. By two the purpose of channel is to consider: first, if only because of the use of static, the experimental corpus using the word2vec corpus and exercise in the experiment may not lead to differences, embedding bias; second, if only use the double non-static vector, which has influence on the result of initialization and convergence. So the use of a mixed channel can "neutralize" the two above problems.
In the input layer after layer is the convolution, the above figure, the top of the filter shape is 3*6, it is about this sentence: "wait for the vedio and do n" t rent it ", the filter every three words to do a convolution operation, a length of 8 sentences, after the convolution operation in the filter, the output will produce a 7*1. Of course, the filter number of the coiling layer and the shape of the filter can be changed, and the principle is the same.
The next layer is the pooling layer. This paper uses max-pooling, that is, the output of 7*1's convolution layer will be pooling into a 1*1 value. N filter will generate n 1*1 values, which will be used in the full join layer.
Then there is a fully connected output layer. The number of output layers corresponds to the number of texts. The output of the above n pooling layers is connected to the output layer, and the output layer uses the softmax encouragement function.
From the above depiction, we can see that CNN's thinking about the classification is very clear, and it is not difficult to finish. The exercise of parameters is not mentioned. Its experimental results will be given in the later part.
Then take a look at the discussion of the paper [2] (Effective Use of Word Order for Text Categorization with Convolutional Neural Networks)
With the above foundation, it is easier to understand the concept of paper 2. Actually, [2] in the pretreatment process of text vector, it still appears a little rough. It uses one-hot model directly, but it has stopped some improvements. The main difference is in the way of expression vector of the word, in the thesis, the author directly using one-hot word vector model, the author called seq-CNN model, it obviously this will bring heavy dimensions increase, then the author proposes a modified bow-CNN model is continuous several the word nearby to construct a word vector, the differences are as follows:
Seq-CNN model
Seq-CNN model
Bow-CNN model
Bow-CNN model
his
This text classification is a very common and widely used subject in the NLP category, and there was quite a lot of research results, SVM classification rules based on the characteristics such as very common application, plus Naive Bayesian to SVM classifier, and of course based on CRFs to construct the dependency tree of maximum entropy classification is the way, of course, the classification of common BP neural network classification method. In the traditional text classification bag model, when text is transformed into text vector, text vector dimension often comes through large problems, and of course, there are other ways to reduce dimensions. However, these measures, because in the process of exercise, loss of word order information in the text classification process, the effect is not necessarily satisfactory. This article is mainly learning several deep learning text classification research papers [1,2,3,4,5] and Bowen [6].
background
Deep learning content design in this blog mainly refers to RNN and CNN, while in the [1,2,3,4,5] in fact mainly involves the use of CNN modeling and text classification, the paper [3] mentioned by RNN exercise text vector approach, so in order to depict the simplicity, I used to deep learning said the classification method used in this article.
The reason why CNN can be widely used in text categorization, the main reason is very simple, because the CNN and N-gram model in CNN filter window, in fact can be viewed as the N-gram approach, but because of the use of CNN layer and pooling layer volume, so that CNN can we reduce the number of training parameters, at the same time can also be higher to text information extraction. And RNN is more used in text modeling and on Machine Translation, and it seems to be not so much in text categorization.
CNN for text categorization
It is intended to use the paper [1,2] to illustrate the application of CNN in text classification. The date of the two articles was very close, all of which were in 2014.
First, take a look at the details of the paper [1] (Convolutional Neural Network for Sentence Classification)
Look at the author's CNN Construction:
CNN structure diagram
Explain the above picture.
There are two channel in the output layer on the left, each channel is a two-dimensional matrix, matrix column length equal to the length of the sentence statement (that is the number of words, sentence through padding to be made for each sentence classification has the same length), vector matrix vector representation method of each word in this paper, the author uses the word2vec tool that is initialized, each word do embedding. Two channel is the same as in the initialization time, the use of two channel is the two channel purpose is not the same, one of which is static, is also given after embedding, its value is not changed, another channel non-static, said the embedding vector is a parameter, but also demand find out in the derivation. By two the purpose of channel is to consider: first, if only because of the use of static, the experimental corpus using the word2vec corpus and exercise in the experiment may not lead to differences, embedding bias; second, if only use the double non-static vector, which has influence on the result of initialization and convergence. So the use of a mixed channel can "neutralize" the two above problems.
In the input layer after layer is the convolution, the above figure, the top of the filter shape is 3*6, it is about this sentence: "wait for the vedio and do n" t rent it ", the filter every three words to do a convolution operation, a length of 8 sentences, after the convolution operation in the filter, the output will produce a 7*1. Of course, the filter number of the coiling layer and the shape of the filter can be changed, and the principle is the same.
The next layer is the pooling layer. This paper uses max-pooling, that is, the output of 7*1's convolution layer will be pooling into a 1*1 value. N filter will generate n 1*1 values, which will be used in the full join layer.
Then there is a fully connected output layer. The number of output layers corresponds to the number of texts. The output of the above n pooling layers is connected to the output layer, and the output layer uses the softmax encouragement function.
From the above depiction, we can see that CNN's thinking about the classification is very clear, and it is not difficult to finish. The exercise of parameters is not mentioned. Its experimental results will be given in the later part.
Then take a look at the discussion of the paper [2] (Effective Use of Word Order for Text Categorization with Convolutional Neural Networks)
With the above foundation, it is easier to understand the concept of paper 2. Actually, [2] in the pretreatment process of text vector, it still appears a little rough. It uses one-hot model directly, but it has stopped some improvements. The main difference is in the way of expression vector of the word, in the thesis, the author directly using one-hot word vector model, the author called seq-CNN model, it obviously this will bring heavy dimensions increase, then the author proposes a modified bow-CNN model is continuous several the word nearby to construct a word vector, the differences are as follows:
Seq-CNN model
Seq-CNN model
Bow-CNN model
Bow-CNN model
his