Analysis of OCR recognition

News classification

Contact us

Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
Tel: 13146317170 廖经理
Fax:
Email: 398017534@qq.com

research background

About the optical character recognition (Optical Character Recognition, the following are referred to as OCR), refers to the image of the text into the text content of the computer can be edited, many researchers on the related technology research has been a long time, there are a lot of mature OCR technology and products, such as OCR, FineReader, ABBYY Hanvon Tesseract OCR it is worth mentioning that, ABBYY FineReader is not only the correct rate (including the identification of the Chinese), but also save large local typesetting effect, is a very powerful commercial software OCR.

However, in many of the OCR waste, in addition to Tesseract OCR, the other is a closed source, that is the commercial software, we cannot embed them into our own program, can't stop the improvement on it. The only choice is the open source Google Tesseract OCR, but its identification effect not very good, but Chinese correct identification rate is low, needs to be further improved.

To sum up, it is necessary to stop and improve the OCR technology, whether for academic research or practical application. Our team will complete the OCR system is divided into "feature extraction", "text" and "optical identification" and "language model" in four aspects, gradually stop processing, finally completed a usable, intact, OCR system for printing text. The system can initially for electricity providers, such as WeChat platform the picture text recognition, in order to determine the authenticity of the information.

Research hypothesis

In this article, we assume that the local character of the image has the following characteristics:

1. we want to identify the assumption that image fonts are more standard printing fonts, such as Arial, bold, italics, script etc.;
2. characters and background should have a comparable degree of comparison.
3. in the design of the model, we assume that the picture text is transversal typesetting;
The strokes of the 4. characters should have a certain width, which can not be too thin;
5. the color of the same text should be most abrupt.
6. ordinary text is a dense stroke of strokes, and many times have certain connectivity.

It can be seen that these features are common characteristics of common commercial posters such as e-commerce posters, and these assumptions are quite reasonable.

Analysis process

640? Wx_fmt=png

Figure 1: our experimental flow chart
feature extraction

As the first step of OCR system, feature extraction is to find out candidate text area features, so that we can stop text location and third step stop recognition in the second step. In this part of the content, we focus on simulating the disposal process of images and Chinese characters by the naked eye, and take an innovative way in the disposal of images and the localization of Chinese characters. This part of the work is the most central part of the whole OCR system and the most central part of our work.
Most of the traditional text segmentation thoughts are "edge detection + corrosion contraction + Unicom region detection", such as paper [1]. However, stopping the edge detection under the background of complex background will lead to too much edge of the background (i.e. noise increase), while the local edge information of the text is easy to be neglected, resulting in the worse effect.

If the corrosion or contraction is stopped at this time, the background area will be glued to the text area and the effect will deteriorate further. In fact, we had gone far enough on this road. We even wrote the edge detection function to do this. After many tests, we finally decided to give up this kind of thinking.
Therefore, in this paper, we give the edge detection and corrosion after contraction, clustering, segmentation, denoising, etc. step, obtained relatively good local text features, the whole process are shown in Figure 2, these characters that can be entered directly into the text recognition model to identify, without additional disposal. Because each of the local results has the corresponding theoretical base as the support, so the model can be guaranteed.

640? Wx_fmt=png

Figure 2: feature extraction about the process
In this part of the experiment, we demonstrate our effect in Figure 3. The characteristics of this image are medium size, bright background, rich color, and mixed typesetting with pictures, and typesetting format is not fixed. It's a typical business propaganda picture. You can see that the key to deal with this picture is how to identify the picture area and text area, identify and exclude the right side of the rice cooker, only save the text area.

640? Wx_fmt=png

Figure 3: introduction of millet electric cooker
Predisposition of images

First, we read the original image in a gray image, and get a gray matrix, in which m, n is the length and width of the image. This reading is lower than the dimensionality of reading the RGB color image directly, and there is no significant loss of text information. Conversion to grayscale is actually the three channel of the original RGB image with the following formula

640? Wx_fmt=png (1)

Integrate into a channel:

640? Wx_fmt=png

The grayscale map of Figure 3

The size of the image itself is not large. If it is disposed directly, it will incur too small textual strokes and be easily disposed of as noise. Therefore, in order to ensure that the strokes of the text have a certain thickness, we can first enlarge the picture. In our experiment, the image is two times larger than the original image.
However, after the image is magnified, the discrimination between the text and the background is reduced. This is due to the use of interpolation algorithms to fill in the missing pixels when the image is magnified. At this point, the discrimination is correspondingly increased by the demand. After testing, in most of the images, the number of "power times" is 2.

PREVIOUS：Application of artificial intelligence d NEXT：Machine learning and artificial intellig