News classification
Contact us
- Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
- Tel: 13146317170 廖经理
- Fax:
- Email: 398017534@qq.com
Face recognition face detection
Face recognition face detection
Background introduction
Facial Detection (Face Detection) is to give an image, find the position of all faces in the image, usually framed by a rectangle, the input is an image img, the output is a number of rectangular frames containing the face position ( x, y, w, h), like this.
Face detection is very easy for us human beings. For the needs of social life, we have a special face detection module in our brain that is very sensitive to human faces. Even if the following simple strokes, the brain can easily detect faces, and The respective expressions. Face detection is very important. What is the use of it? (2015 A Survey)
Automatic face detection is the basis for all applications of automatic face image analysis, including but not limited to: face recognition and verification, face tracking in surveillance situations, facial expression analysis, facial attribute recognition (gender/age recognition, face value) Evaluation), facial light adjustment and distortion, facial shape reconstruction, image video retrieval, digital photo album organization and presentation.
Face detection is the initial step for all modern vision-based people and computers, and humans and robots, to interact with the system.
The mainstream commercial digital cameras have embedded face detection and assisted auto focus.
Many social networks such as FaceBook use face detection mechanisms to implement image/character marking.
From the point of view of the problem, face detection belongs to the field of target detection, and target detection usually has two major categories:
Universal target detection: Detection of multiple categories of targets in the image, such as ILSVRC2017's VID task to detect 200 categories of targets, VOC2012 to detect 20 categories of targets, and universal target detection core to be n(target)+1 (background)=n+1 classification problems. This kind of detection is usually a relatively large model with slow speed. There are very few STOA methods that can achieve CPU real-time.
Specific categories of target detection: detect only certain types of specific targets in the image, such as face detection, pedestrian detection, vehicle detection, etc. The specific category of target detection core is 1 (target) + 1 (background) = 2 classification problem. This type of detection is usually a relatively small model with very high speed requirements. The basic requirement for this problem is the CPU real-time.
From the perspective of development history, the role of deep learning in it is very clear:
Non-deep learning phase: Classic detection algorithms are proposed for specific goals during this period. For example, Viola-Jones (VJ) for CVPR 2001 is for face detection problems, and HOG+SVM for CVPR 2005 is for pedestrian detection problems. TPAMI 2010 DPM, although it can detect all kinds of targets, but for multi-target detection, requires training templates for each category, which is equivalent to 200 specific categories of detection problems.
Deep learning phase: Classic detection algorithms are proposed for common goals during this period, such as Faster-RCNN with better performance, R-FCN series, faster YOLO, SSD series, and powerful deep learning with only one CNN. Do multi-class detection tasks (number of models 1 vs. 200, CNN really slow?). Although these are multi-category methods, they can all be used to solve single-category problems. At present, state-of-the-art (SOTA) for specific target detection problems, such as face detection and pedestrian detection, is a pertinence of such methods. Improve.
At present, CV algorithms based on deep learning mainly focus on general target detection. These methods are effective on face detection problems. It is better to use them directly. Why should we study this problem?
Faster-RCNN series: The advantage of this type of method is its high performance. The disadvantage is that it is slow and it cannot be real-time on the GPU. It cannot meet the extremely high speed requirements of face detection. Since performance is not a problem, the research focus of this type of method is Improve efficiency.
SSD series: The advantage of this type of method is that it is fast and can be real-time on the GPU. The disadvantage is that the detection of dense small targets is poor, and the face is just a dense small target. The research focus of this type of method is to improve the dense small target. Detecting performance while speed needs to be as fast as possible, and GPU real-time algorithms are still limited in application.
Face detection also has a special cascade of CNN series, which will be introduced later. At present, the face detection research holds the thigh detected by a universal target. This is a fact and a status quo, but the requirement of high speed and performance is still challenging.
Evaluation index
To evaluate a face detection algorithm, three indicators are commonly used:
Recall: The more number of faces a detector can detect, the better. Since each image contains a certain number of faces, it is measured by the ratio of the detected, this indicator is the recall rate. No, it doesn't matter how many faces each picture contains. The closer the rectangular box detected by the detector is to the manually labeled rectangular box, the better the detection result is. Generally, if the IoU is greater than 0.5, it is considered to be detected, so recall = the number of detected faces/the total number of faces in the image .
False positives: The detector also makes mistakes, and may consider other things to be human faces. The less the better, we use the absolute number of detection errors to indicate that this indicator is false positives. Contrary to recall, if the IoU of the rectangular box detected by the detector and any manual tagged box is less than 0.5, the result of this test is considered to be false detection. The fewer false inspections, the better. For example, on the FDDB, the paper generally compares 1000 or 2000. The rate of recalls at the time of false positives is usually 100 or 200 falsely recalled in the case of industrial applications.
Detection speed: It is an algorithm to be faster than speed, not to mention face detection. The less time a detector takes to detect an image, the better. It is usually expressed in frame-per-second (FPS). However, here is a small problem, many detectors are smaller images, fewer faces in the image, the smallest detection of the largest face, the faster the detection, need to pay attention to different papers test environment and test images may not be the same: test images, The most common configuration is the VGA (640*480) image detection minimum face 80*80 gives the speed, but none of them show that the background of the test image is complex, there are several faces in the image (even the image speed of a white face) ); test environment, the difference is even greater, CPU has different models and frequency, multi-core multi-threaded differences, GPU also has different models, etc. (this understanding, personally feel that there are problems, algorithm frame rate assessment and background complexity has nothing to do The same data used by different algorithms can be compared to sex, and the author will actually be the worst and best case.
In general, the higher the number of false positives is, the higher the recall rate is. Compare the recall rate with the same number of false positives, the same test environment, and the speed of image comparison. Please be as fair as possible. The following figure is a simple example of evaluation indicators. The image contains a total of 7 human faces (yellow ellipses). A detector gives 8 test results (green box), of which 5 are correct and 3 are incorrect. At this time, the number of false positives is 3 The recall rate was 5/7 = 71.43%.
Common database
There are many test databases for face detection. Here only FDDB and WIDER FACE are selected. Both databases have official long-term maintenance. Various algorithms will submit the results for comparison. Many of the early databases are currently saturated and have no significance.
The first is the 2010 unconstrained environment face detection database FDDB FDDB : Main:
Jain V, Learned-Miller E. Fddb: A benchmark for face detection in unconstrained settings [R]. Technical Report UM-CS-2010-009, University of Massachusetts, Amherst, 2010.
The FDDB has a total of 2845 images and 5171 images. It is the most commonly used database for the target, such as non-constraint face, face difficulty, facial expression, double chin, light changes, wear, exaggerated hair style, and occlusion. Has the following characteristics:
The image resolution is small, and the longer sides of all the images are scaled to 450, which means that all images are smaller than 450*450, and the smallest one is 20*20, including both color and grayscale images;
The number of faces in each image is relatively low, with an average of 1.8 faces/maps, and most images have only one face;
Datasets are completely public, published methods usually have papers, most of them are open source code and can be reproduced, and have high reliability; unpublished methods have no papers and no code, and it is impossible to confirm whether their training set is completely isolated and skeptical, usually No comparison. (Throw a few FDDB images to the training set. VJ can also train a high recall rate. Need to consider the temptation of character to resist the benefits)
There are other isolated datasets for unlimited training FDDB testing, and FDDB tenfold cross-validation. In view of the small number of FDDB images, the paper submission results in recent years are also unrestricted training FDDB test methods, so if you want to Published methods Submit results, please follow the instructions. Mr. Shan Shiguang also said that 10-fold cross validation is usually 1-3% higher.
The results are discrete fraction discROC and continuous score contROC, discROC is only interested in whether IoU is greater than 0.5, contROC is larger IoU is better. In view of the fact that everyone adopts the method of unrestricted training plus FDDB testing, the detector inherits the annotation style of the training data set, which in turn influences contROC, so discROC is more important, and contROC can be seen on the line. Don't worry too much.
Facial Detection (Face Detection) is to give an image, find the position of all faces in the image, usually framed by a rectangle, the input is an image img, the output is a number of rectangular frames containing the face position ( x, y, w, h), like this.
Face detection is very easy for us human beings. For the needs of social life, we have a special face detection module in our brain that is very sensitive to human faces. Even if the following simple strokes, the brain can easily detect faces, and The respective expressions. Face detection is very important. What is the use of it? (2015 A Survey)
Automatic face detection is the basis for all applications of automatic face image analysis, including but not limited to: face recognition and verification, face tracking in surveillance situations, facial expression analysis, facial attribute recognition (gender/age recognition, face value) Evaluation), facial light adjustment and distortion, facial shape reconstruction, image video retrieval, digital photo album organization and presentation.
Face detection is the initial step for all modern vision-based people and computers, and humans and robots, to interact with the system.
The mainstream commercial digital cameras have embedded face detection and assisted auto focus.
Many social networks such as FaceBook use face detection mechanisms to implement image/character marking.
From the point of view of the problem, face detection belongs to the field of target detection, and target detection usually has two major categories:
Universal target detection: Detection of multiple categories of targets in the image, such as ILSVRC2017's VID task to detect 200 categories of targets, VOC2012 to detect 20 categories of targets, and universal target detection core to be n(target)+1 (background)=n+1 classification problems. This kind of detection is usually a relatively large model with slow speed. There are very few STOA methods that can achieve CPU real-time.
Specific categories of target detection: detect only certain types of specific targets in the image, such as face detection, pedestrian detection, vehicle detection, etc. The specific category of target detection core is 1 (target) + 1 (background) = 2 classification problem. This type of detection is usually a relatively small model with very high speed requirements. The basic requirement for this problem is the CPU real-time.
From the perspective of development history, the role of deep learning in it is very clear:
Non-deep learning phase: Classic detection algorithms are proposed for specific goals during this period. For example, Viola-Jones (VJ) for CVPR 2001 is for face detection problems, and HOG+SVM for CVPR 2005 is for pedestrian detection problems. TPAMI 2010 DPM, although it can detect all kinds of targets, but for multi-target detection, requires training templates for each category, which is equivalent to 200 specific categories of detection problems.
Deep learning phase: Classic detection algorithms are proposed for common goals during this period, such as Faster-RCNN with better performance, R-FCN series, faster YOLO, SSD series, and powerful deep learning with only one CNN. Do multi-class detection tasks (number of models 1 vs. 200, CNN really slow?). Although these are multi-category methods, they can all be used to solve single-category problems. At present, state-of-the-art (SOTA) for specific target detection problems, such as face detection and pedestrian detection, is a pertinence of such methods. Improve.
At present, CV algorithms based on deep learning mainly focus on general target detection. These methods are effective on face detection problems. It is better to use them directly. Why should we study this problem?
Faster-RCNN series: The advantage of this type of method is its high performance. The disadvantage is that it is slow and it cannot be real-time on the GPU. It cannot meet the extremely high speed requirements of face detection. Since performance is not a problem, the research focus of this type of method is Improve efficiency.
SSD series: The advantage of this type of method is that it is fast and can be real-time on the GPU. The disadvantage is that the detection of dense small targets is poor, and the face is just a dense small target. The research focus of this type of method is to improve the dense small target. Detecting performance while speed needs to be as fast as possible, and GPU real-time algorithms are still limited in application.
Face detection also has a special cascade of CNN series, which will be introduced later. At present, the face detection research holds the thigh detected by a universal target. This is a fact and a status quo, but the requirement of high speed and performance is still challenging.
Evaluation index
To evaluate a face detection algorithm, three indicators are commonly used:
Recall: The more number of faces a detector can detect, the better. Since each image contains a certain number of faces, it is measured by the ratio of the detected, this indicator is the recall rate. No, it doesn't matter how many faces each picture contains. The closer the rectangular box detected by the detector is to the manually labeled rectangular box, the better the detection result is. Generally, if the IoU is greater than 0.5, it is considered to be detected, so recall = the number of detected faces/the total number of faces in the image .
False positives: The detector also makes mistakes, and may consider other things to be human faces. The less the better, we use the absolute number of detection errors to indicate that this indicator is false positives. Contrary to recall, if the IoU of the rectangular box detected by the detector and any manual tagged box is less than 0.5, the result of this test is considered to be false detection. The fewer false inspections, the better. For example, on the FDDB, the paper generally compares 1000 or 2000. The rate of recalls at the time of false positives is usually 100 or 200 falsely recalled in the case of industrial applications.
Detection speed: It is an algorithm to be faster than speed, not to mention face detection. The less time a detector takes to detect an image, the better. It is usually expressed in frame-per-second (FPS). However, here is a small problem, many detectors are smaller images, fewer faces in the image, the smallest detection of the largest face, the faster the detection, need to pay attention to different papers test environment and test images may not be the same: test images, The most common configuration is the VGA (640*480) image detection minimum face 80*80 gives the speed, but none of them show that the background of the test image is complex, there are several faces in the image (even the image speed of a white face) ); test environment, the difference is even greater, CPU has different models and frequency, multi-core multi-threaded differences, GPU also has different models, etc. (this understanding, personally feel that there are problems, algorithm frame rate assessment and background complexity has nothing to do The same data used by different algorithms can be compared to sex, and the author will actually be the worst and best case.
In general, the higher the number of false positives is, the higher the recall rate is. Compare the recall rate with the same number of false positives, the same test environment, and the speed of image comparison. Please be as fair as possible. The following figure is a simple example of evaluation indicators. The image contains a total of 7 human faces (yellow ellipses). A detector gives 8 test results (green box), of which 5 are correct and 3 are incorrect. At this time, the number of false positives is 3 The recall rate was 5/7 = 71.43%.
Common database
There are many test databases for face detection. Here only FDDB and WIDER FACE are selected. Both databases have official long-term maintenance. Various algorithms will submit the results for comparison. Many of the early databases are currently saturated and have no significance.
The first is the 2010 unconstrained environment face detection database FDDB FDDB : Main:
Jain V, Learned-Miller E. Fddb: A benchmark for face detection in unconstrained settings [R]. Technical Report UM-CS-2010-009, University of Massachusetts, Amherst, 2010.
The FDDB has a total of 2845 images and 5171 images. It is the most commonly used database for the target, such as non-constraint face, face difficulty, facial expression, double chin, light changes, wear, exaggerated hair style, and occlusion. Has the following characteristics:
The image resolution is small, and the longer sides of all the images are scaled to 450, which means that all images are smaller than 450*450, and the smallest one is 20*20, including both color and grayscale images;
The number of faces in each image is relatively low, with an average of 1.8 faces/maps, and most images have only one face;
Datasets are completely public, published methods usually have papers, most of them are open source code and can be reproduced, and have high reliability; unpublished methods have no papers and no code, and it is impossible to confirm whether their training set is completely isolated and skeptical, usually No comparison. (Throw a few FDDB images to the training set. VJ can also train a high recall rate. Need to consider the temptation of character to resist the benefits)
There are other isolated datasets for unlimited training FDDB testing, and FDDB tenfold cross-validation. In view of the small number of FDDB images, the paper submission results in recent years are also unrestricted training FDDB test methods, so if you want to Published methods Submit results, please follow the instructions. Mr. Shan Shiguang also said that 10-fold cross validation is usually 1-3% higher.
The results are discrete fraction discROC and continuous score contROC, discROC is only interested in whether IoU is greater than 0.5, contROC is larger IoU is better. In view of the fact that everyone adopts the method of unrestricted training plus FDDB testing, the detector inherits the annotation style of the training data set, which in turn influences contROC, so discROC is more important, and contROC can be seen on the line. Don't worry too much.