News classification
Contact us
- Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
- Tel: 13146317170 廖经理
- Fax:
- Email: 398017534@qq.com
Face detection CNN
Face detection CNN
(1) network cascade
The following diagram shows the overall flow chart of the scheme, which can be clearly seen as a three-class Association (12-net, 24-net, 48-net).
Here is a picture description.
The principles and benefits of cascading:
1. The initial network can be comparatively simple, and the threshold can be set loosely, so that a large number of non-face windows can be swept away while maintaining a high recall rate.
2. In order to ensure sufficient performance of the final stage network, the analogy of ordinary design is complex, but it only needs to dispose of the remaining windows in front, so it can ensure sufficient efficiency.
3. The cascade idea can help us to combine the classifiers with poor performance, and at the same time, it can achieve certain efficiency guarantee.
The following figure shows the number of remaining windows in the 3 stage and the corresponding recall rate:
Here is a picture description.
(2) multi-scale characteristics
The following is a detailed network structure of the 3 order network:
Here is a picture description.
As can be seen from the above, the first 2 levels of the network are very simple, as long as the third order is more complicated. This is not the point. The key is to learn the multi-scale feature combination from above.
Take the 24-net of the second stage as an example. First, the resize of the remaining window in the previous stage is 24*24. Then, the resize is fed into the network to get the characteristics of the full cohesion layer. At the same time, the features of the full cohesive layer of 12-net are taken together. Finally, softmax classification is performed after the combined features.
The following figure shows the difference between using and not using multi-scale features:
Here is a picture description.
It can be seen from the graph that the higher recall rate can be obtained under the same conditions by using multi-scale features, that is, the classification ability of the network can be improved.
(3) correcting network - transforming regression into classification problem
It should be noted in the overall flow chart that the output of each classification network will pass through a corresponding correction network before it is sent to the next stage.
The correction network is to solve the problem of positioning inaccuracy.
Here is a picture description.
In the figure above, the blue box is the output of the classification network (such as 12-net), and the red box is the output corrected by the corresponding 12-calibration-net.
For rectangular rectangular rectangular frame correction, we only need three parameters, one is the degree translation xn, one is the vertical translation yn, and one is the width-height scale sn. That is, we need to adjust the control coordinates of the rectangle box to:
Here is a picture description.
Intuition, this is to deal with a regression problem, demand regression 3 parameters. But the continuous regression problem is very difficult to deal with, so the article turns it into a discrete classification problem. First, the 3 parameters are listed separately.
Here is a picture description.
Therefore, our goal is to set up a 45-class classifier, after all, the strength of neural network lies in classification.
However, the accuracy of the classifier is still not very stable. Therefore, the article chooses several classes with higher scores to do homogeneity, and finally stops the correction.
Here is a picture description.
(4) cascade training methods
In connection with cascading problems, the article adopts a special exercise strategy.
Here is a picture description.
1, organize the positive and negative samples to exercise the 12-net and 12-calibration-net networks in the first stage according to the common method.
2. Applying the above 1-layer network to face detection on AFLW data sets, the threshold T1 is confirmed on the basis of 99% recall rate.
3. The non-face windows judged as faces on AFLW are taken as negative samples, and all real faces are taken as positive samples to train the second stage 24-net and 24-calibration-net networks.
4, repeat 2 and 3, finish the last stage of exercise.
The following diagram shows the overall flow chart of the scheme, which can be clearly seen as a three-class Association (12-net, 24-net, 48-net).
Here is a picture description.
The principles and benefits of cascading:
1. The initial network can be comparatively simple, and the threshold can be set loosely, so that a large number of non-face windows can be swept away while maintaining a high recall rate.
2. In order to ensure sufficient performance of the final stage network, the analogy of ordinary design is complex, but it only needs to dispose of the remaining windows in front, so it can ensure sufficient efficiency.
3. The cascade idea can help us to combine the classifiers with poor performance, and at the same time, it can achieve certain efficiency guarantee.
The following figure shows the number of remaining windows in the 3 stage and the corresponding recall rate:
Here is a picture description.
(2) multi-scale characteristics
The following is a detailed network structure of the 3 order network:
Here is a picture description.
As can be seen from the above, the first 2 levels of the network are very simple, as long as the third order is more complicated. This is not the point. The key is to learn the multi-scale feature combination from above.
Take the 24-net of the second stage as an example. First, the resize of the remaining window in the previous stage is 24*24. Then, the resize is fed into the network to get the characteristics of the full cohesion layer. At the same time, the features of the full cohesive layer of 12-net are taken together. Finally, softmax classification is performed after the combined features.
The following figure shows the difference between using and not using multi-scale features:
Here is a picture description.
It can be seen from the graph that the higher recall rate can be obtained under the same conditions by using multi-scale features, that is, the classification ability of the network can be improved.
(3) correcting network - transforming regression into classification problem
It should be noted in the overall flow chart that the output of each classification network will pass through a corresponding correction network before it is sent to the next stage.
The correction network is to solve the problem of positioning inaccuracy.
Here is a picture description.
In the figure above, the blue box is the output of the classification network (such as 12-net), and the red box is the output corrected by the corresponding 12-calibration-net.
For rectangular rectangular rectangular frame correction, we only need three parameters, one is the degree translation xn, one is the vertical translation yn, and one is the width-height scale sn. That is, we need to adjust the control coordinates of the rectangle box to:
Here is a picture description.
Intuition, this is to deal with a regression problem, demand regression 3 parameters. But the continuous regression problem is very difficult to deal with, so the article turns it into a discrete classification problem. First, the 3 parameters are listed separately.
Here is a picture description.
Therefore, our goal is to set up a 45-class classifier, after all, the strength of neural network lies in classification.
However, the accuracy of the classifier is still not very stable. Therefore, the article chooses several classes with higher scores to do homogeneity, and finally stops the correction.
Here is a picture description.
(4) cascade training methods
In connection with cascading problems, the article adopts a special exercise strategy.
Here is a picture description.
1, organize the positive and negative samples to exercise the 12-net and 12-calibration-net networks in the first stage according to the common method.
2. Applying the above 1-layer network to face detection on AFLW data sets, the threshold T1 is confirmed on the basis of 99% recall rate.
3. The non-face windows judged as faces on AFLW are taken as negative samples, and all real faces are taken as positive samples to train the second stage 24-net and 24-calibration-net networks.
4, repeat 2 and 3, finish the last stage of exercise.