News classification
Contact us
- Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
- Tel: 13146317170 廖经理
- Fax:
- Email: 398017534@qq.com
Face recognition of face detection Cascade CNN.
Face recognition of face detection Cascade CNN.
Face recognition of face detection Cascade CNN.
This is CVPR in 2015, a little early, but classic.
1. The Introduction
The cascade structure of this paper is to integrate a series of shallow networks, which can effectively reduce the calculation of CNN.
The work of this paper is to train the classifier directly from the picture, instead of relying on the characteristics of manual selection (the basic feature is to extract features before 15 years).
Features:
The background can be quickly eliminated at the low resolution stage of 12-net, and it can be detected in the high resolution phase of 24-net. (this sentence reads the specific network structure in the back.)
2. CNN Cascade
2.1. The overview
(1) first, scan the whole image with the detection window of 12x12, and the detection window is put into 12-net, and 90% of the detection window is removed.
Then use the 12-correction network to process the remaining Windows, adjust the size and position.
The high coincidence detection window was eliminated by NMS.
(2) for the remaining window, due to the fine repair, it is not 24x24. Therefore, it is cut out, resize to 24x24, and send to the detection window of slag removal in 24-net.
And then you can correct it
Same thing with NMS
(3) then resize to 48x48 and use 48-net.
Again, the NMS of the image scale.
Use the 48- correction network for calibration and output bbox.
2.2. The structure of the CNN
6 CNN (3 dichotomies, 3 bbox corrections)
2.2.1 12 -.net
Here's a picture description.
For image, use 12x12 detection window, stride=4 to slide (note: not convolution). Each 12x12 window was put into the 12-net classification, and the detection window with a low probability score of 90% was eliminated.
In fact, because the scale of face in the picture is different, the image pyramid is used to cover different scales.
If the size of the minimum face is F in image, the image needs to be transformed into image_size * 12 / F to meet the small target of 12x12 kernel.
Eg: the image is 800x600, the minimum face is 40x40, and the image is scaled to 240x180, so the face size is 12x12.
Finally, the image of this scale is generated [(240-12)/4 +1] *[(180-12)/4 +1]= 2,394 detection Windows.
** in fact, the number of these test Windows is represented by a map. The value of each location represents the confidence of each bbox.
1
2
3
4
5
6
2.2.2 12 - calibration - net
Here's a picture description.
Here we provide 45 [s,x,y] vectors to correct position and size.
Here's a picture description.
Put the above detection window into the network and output 45 confidence score.
For [s,x,y], the average (the premise is confidence>t), the x,y,w and h of bbox are corrected.
Here's a picture description.
Correction formula:
Here's a picture description.
Then, do NMS.
The size of the detection window is no longer 12x12 -- --.
2.2.3 24 -.net
Here's a picture description.
The resulting window: resize to 24x24, input to 24-net for classification.
At the same time: input resize to 12x12, input into 12-net after FC, connect to 24-net FC, and then classify.
Kick out the slag detection window.
2.2.4 24 - calibration - net
It's about the same as 12-calibration-net.
Then, do NMS.
The size of the detection window is no longer 24x24 -- --.
2.2.5 48 -.net
slightly
For NMS, it is necessary to perform NMS on all scale images (because it is the image pyramid), and the subsequent repair of the location and size can be output.
2.2.6 48 - calibration - net
slightly
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
2.3 Training Process
You can train the classifier with some positive and negative samples.
This is CVPR in 2015, a little early, but classic.
1. The Introduction
The cascade structure of this paper is to integrate a series of shallow networks, which can effectively reduce the calculation of CNN.
The work of this paper is to train the classifier directly from the picture, instead of relying on the characteristics of manual selection (the basic feature is to extract features before 15 years).
Features:
The background can be quickly eliminated at the low resolution stage of 12-net, and it can be detected in the high resolution phase of 24-net. (this sentence reads the specific network structure in the back.)
2. CNN Cascade
2.1. The overview
(1) first, scan the whole image with the detection window of 12x12, and the detection window is put into 12-net, and 90% of the detection window is removed.
Then use the 12-correction network to process the remaining Windows, adjust the size and position.
The high coincidence detection window was eliminated by NMS.
(2) for the remaining window, due to the fine repair, it is not 24x24. Therefore, it is cut out, resize to 24x24, and send to the detection window of slag removal in 24-net.
And then you can correct it
Same thing with NMS
(3) then resize to 48x48 and use 48-net.
Again, the NMS of the image scale.
Use the 48- correction network for calibration and output bbox.
2.2. The structure of the CNN
6 CNN (3 dichotomies, 3 bbox corrections)
2.2.1 12 -.net
Here's a picture description.
For image, use 12x12 detection window, stride=4 to slide (note: not convolution). Each 12x12 window was put into the 12-net classification, and the detection window with a low probability score of 90% was eliminated.
In fact, because the scale of face in the picture is different, the image pyramid is used to cover different scales.
If the size of the minimum face is F in image, the image needs to be transformed into image_size * 12 / F to meet the small target of 12x12 kernel.
Eg: the image is 800x600, the minimum face is 40x40, and the image is scaled to 240x180, so the face size is 12x12.
Finally, the image of this scale is generated [(240-12)/4 +1] *[(180-12)/4 +1]= 2,394 detection Windows.
** in fact, the number of these test Windows is represented by a map. The value of each location represents the confidence of each bbox.
1
2
3
4
5
6
2.2.2 12 - calibration - net
Here's a picture description.
Here we provide 45 [s,x,y] vectors to correct position and size.
Here's a picture description.
Put the above detection window into the network and output 45 confidence score.
For [s,x,y], the average (the premise is confidence>t), the x,y,w and h of bbox are corrected.
Here's a picture description.
Correction formula:
Here's a picture description.
Then, do NMS.
The size of the detection window is no longer 12x12 -- --.
2.2.3 24 -.net
Here's a picture description.
The resulting window: resize to 24x24, input to 24-net for classification.
At the same time: input resize to 12x12, input into 12-net after FC, connect to 24-net FC, and then classify.
Kick out the slag detection window.
2.2.4 24 - calibration - net
It's about the same as 12-calibration-net.
Then, do NMS.
The size of the detection window is no longer 24x24 -- --.
2.2.5 48 -.net
slightly
For NMS, it is necessary to perform NMS on all scale images (because it is the image pyramid), and the subsequent repair of the location and size can be output.
2.2.6 48 - calibration - net
slightly
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
2.3 Training Process
You can train the classifier with some positive and negative samples.