News classification
Contact us
- Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
- Tel: 13146317170 廖经理
- Fax:
- Email: 398017534@qq.com
Deep learning based OCR recognition technology
Deep learning based OCR recognition technology
Deep learning based OCR recognition technology
Deep learning OCR is a two-step process:
1. Detection: find the area containing the text/number (proposal);
2. Classification: Identify the text/number in the area.
Deep learning OCR detection:
1, faster-rcnn series: area-based object detection, characterized by high precision, the disadvantage is slow speed;
2, yolo series: based on regression object detection, characterized by fast speed, the disadvantage is low precision.
OCR classification for deep learning:
1. The multi-digit number classification proposed by Ian Goodfellow in 13 years ([1312.6082] Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks), based on deep CNN. The disadvantage is that the maximum predictable sequence is selected in advance. Length, more suitable for house number or license plate number (small characters, and each character can be regarded as independent);
2. RNN/LSTM/GRU + CTC, which was first proposed by Alex Graves in 2006 for speech recognition. The advantage is that text of any length can be produced, and the nature of the model determines its ability to learn the connection between text and text. The disadvantage is that the computational efficiency is lower than CNN.
3, attention-mechanism, attention can be divided into hard attention and soft attention. Which hard attention can directly give the hard location, usually the location of the bounding box, the advantage is intuitive, the disadvantage is that you can not directly bp. Soft attention is usually rnn/lstm/gru encoder-decoder model can be bp.
Deep learning OCR is a two-step process:
1. Detection: find the area containing the text/number (proposal);
2. Classification: Identify the text/number in the area.
Deep learning OCR detection:
1, faster-rcnn series: area-based object detection, characterized by high precision, the disadvantage is slow speed;
2, yolo series: based on regression object detection, characterized by fast speed, the disadvantage is low precision.
OCR classification for deep learning:
1. The multi-digit number classification proposed by Ian Goodfellow in 13 years ([1312.6082] Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks), based on deep CNN. The disadvantage is that the maximum predictable sequence is selected in advance. Length, more suitable for house number or license plate number (small characters, and each character can be regarded as independent);
2. RNN/LSTM/GRU + CTC, which was first proposed by Alex Graves in 2006 for speech recognition. The advantage is that text of any length can be produced, and the nature of the model determines its ability to learn the connection between text and text. The disadvantage is that the computational efficiency is lower than CNN.
3, attention-mechanism, attention can be divided into hard attention and soft attention. Which hard attention can directly give the hard location, usually the location of the bounding box, the advantage is intuitive, the disadvantage is that you can not directly bp. Soft attention is usually rnn/lstm/gru encoder-decoder model can be bp.