News classification
Contact us
- Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
- Tel: 13146317170 廖经理
- Fax:
- Email: 398017534@qq.com
PR and ROC curves in AI machine learning
PR and ROC curves in AI machine learning
1. Precision rate, recall rate and F1
Regarding the two-category problem, the sample can be divided into a real case (TP), a false counterexample (FN), a false positive example (FP), a true counterexample (TN), and a detailed classification result according to the combination of the real category and the learner prediction category. as follows
Write a picture here
The precision P and the recall ratio R are defined as:
The rate of care is "predicting the correct rate of positive cases", that is, selecting positive examples from the positive and negative examples.
The investigation of the full rate of care is to "predict the guarantee of positive examples", that is, to select positive cases from the positive examples.
These two are a pair of contradictory metrics. The precision can be thought of as “negative lack of abuse”, suitable for applications with high accuracy requirements, such as product referrals, web search, etc. The recall rate can be thought of as "Ning is wrong to kill one hundred, not letting one out", which is similar to checking smuggling and fugitive information.
The figure below shows the precision-check rate curve (P-R chart)
Precision rate - full rate curve
If the P-R curve of one learner is completely "wrapped" by another learner, the performance of the latter is better than the former. When there is interspersed, it is possible to calculate the curve enclosing area, but it is more complicated, and the equilibrium point (precision rate = recall rate, BEP) is a measure.
However, BEP is still too simplistic. The more commonly used F1 and Fp metrics are the warfare and weighted warfare of precision and recall. Defined as follows
Obviously, when the F1 value of the learner A is higher than the learner, then the BEP value of A is also higher than B (prefer the P=R into the F1 formula)
2. ROC and AUC
Many learners generate a real value or probability prediction for the test sample, and then compare this predicted value with a classification threshold. If it is greater than the threshold, it is classified into a positive class, otherwise it is an inverse class. Therefore, the classification process can be regarded as selecting a truncation point. .
Different tasks can select different cut-off points. If you pay more attention to the "precision rate", you should select the front position in the sorting to stop the truncation. If you pay attention to the "checking rate", you can choose to cut back. Therefore, sorting the quality of oneself can directly incur the different generalization performance of the learner, and the ROC curve is a tool for starting the study of the learner from this angle.
The coordinates of the curve are the true case rate (TPR) and the false positive rate (FPR), respectively, as defined below.
The figure below shows the ROC curve representation. Because the ideal task usually uses a limited number of test samples to draw the ROC chart, it should not be able to produce a lubrication curve, as shown on the right.
The drawing process is very simple: given m positive examples, n inverse examples, stop sorting according to the prediction results of the learner, first set the classification threshold to the maximum, so that all examples are predicted to be counterexamples, and both TPR and FPR are 0. A point is marked at (0, 0), and the classification threshold is sequentially set to the predicted value of each sample, that is, each example is sequentially divided into positive examples. Let the previous coordinate be (x, y). If the current example is true, the corresponding marker point is (x, y+1/m). If the current is a false positive example, the marker point is (x+1/n, y ), and then connect the points in turn.
Here is a drawing example:
There are 10 examples, 5 positive examples, and 5 negative examples. There are two learners A, B, which stop prediction for 10 examples, and sort the results from high to low according to the predicted values (not listed here in detail):
A: [In any case, right or wrong, anti-reverse]
B : [In any case, anyway, it is right and wrong]
According to the drawing process, the ROC curve points corresponding to the learner can be obtained.
A:y:[0,0,0.2,0.4,0.6,0.6,0.6,0.8,1,1,1]
x:[0,0.2,0.2,0.2,0.2,0.4,0.6,0.6,0.6,0.8,1]
B:y:[0,0,0.2,0.2,0.2,0.2,0.4,0.6,0.8,1,1]
x:[0,0.2,0.2,0.4,0.6,0.8,0.8,0.8,0.8,0.8,1]
The curve results are as follows:
Write a picture here
Blue is the ROC curve of the learner A, which contains the curve of B, which clarifies that its performance is better. This is obviously seen from the sorting results of 10 examples of A and B, and the number of positive sorts in A is high. More than B. In addition, if the two curves are interspersed, it is necessary to calculate the area enclosed by the curve (AUC) to evaluate the performance.
3. Bias and variance
Generalization errors can be synthesized as the sum of deviation, variance and noise
The bias measures the level of deviation between the hope prediction and the true result of the learning algorithm.
The variance measures the change in learning performance caused by changes in the same size exercise set, ie, the effect of data perturbations.
Noise can be thought of as the volatility of the data itself, expressing the lower limit of the generalization error that any learning algorithm can reach.
The bias is large to clarify the under-fitting, and the large variance clarifies the fit.
Regarding the two-category problem, the sample can be divided into a real case (TP), a false counterexample (FN), a false positive example (FP), a true counterexample (TN), and a detailed classification result according to the combination of the real category and the learner prediction category. as follows
Write a picture here
The precision P and the recall ratio R are defined as:
The rate of care is "predicting the correct rate of positive cases", that is, selecting positive examples from the positive and negative examples.
The investigation of the full rate of care is to "predict the guarantee of positive examples", that is, to select positive cases from the positive examples.
These two are a pair of contradictory metrics. The precision can be thought of as “negative lack of abuse”, suitable for applications with high accuracy requirements, such as product referrals, web search, etc. The recall rate can be thought of as "Ning is wrong to kill one hundred, not letting one out", which is similar to checking smuggling and fugitive information.
The figure below shows the precision-check rate curve (P-R chart)
Precision rate - full rate curve
If the P-R curve of one learner is completely "wrapped" by another learner, the performance of the latter is better than the former. When there is interspersed, it is possible to calculate the curve enclosing area, but it is more complicated, and the equilibrium point (precision rate = recall rate, BEP) is a measure.
However, BEP is still too simplistic. The more commonly used F1 and Fp metrics are the warfare and weighted warfare of precision and recall. Defined as follows
Obviously, when the F1 value of the learner A is higher than the learner, then the BEP value of A is also higher than B (prefer the P=R into the F1 formula)
2. ROC and AUC
Many learners generate a real value or probability prediction for the test sample, and then compare this predicted value with a classification threshold. If it is greater than the threshold, it is classified into a positive class, otherwise it is an inverse class. Therefore, the classification process can be regarded as selecting a truncation point. .
Different tasks can select different cut-off points. If you pay more attention to the "precision rate", you should select the front position in the sorting to stop the truncation. If you pay attention to the "checking rate", you can choose to cut back. Therefore, sorting the quality of oneself can directly incur the different generalization performance of the learner, and the ROC curve is a tool for starting the study of the learner from this angle.
The coordinates of the curve are the true case rate (TPR) and the false positive rate (FPR), respectively, as defined below.
The figure below shows the ROC curve representation. Because the ideal task usually uses a limited number of test samples to draw the ROC chart, it should not be able to produce a lubrication curve, as shown on the right.
The drawing process is very simple: given m positive examples, n inverse examples, stop sorting according to the prediction results of the learner, first set the classification threshold to the maximum, so that all examples are predicted to be counterexamples, and both TPR and FPR are 0. A point is marked at (0, 0), and the classification threshold is sequentially set to the predicted value of each sample, that is, each example is sequentially divided into positive examples. Let the previous coordinate be (x, y). If the current example is true, the corresponding marker point is (x, y+1/m). If the current is a false positive example, the marker point is (x+1/n, y ), and then connect the points in turn.
Here is a drawing example:
There are 10 examples, 5 positive examples, and 5 negative examples. There are two learners A, B, which stop prediction for 10 examples, and sort the results from high to low according to the predicted values (not listed here in detail):
A: [In any case, right or wrong, anti-reverse]
B : [In any case, anyway, it is right and wrong]
According to the drawing process, the ROC curve points corresponding to the learner can be obtained.
A:y:[0,0,0.2,0.4,0.6,0.6,0.6,0.8,1,1,1]
x:[0,0.2,0.2,0.2,0.2,0.4,0.6,0.6,0.6,0.8,1]
B:y:[0,0,0.2,0.2,0.2,0.2,0.4,0.6,0.8,1,1]
x:[0,0.2,0.2,0.4,0.6,0.8,0.8,0.8,0.8,0.8,1]
The curve results are as follows:
Write a picture here
Blue is the ROC curve of the learner A, which contains the curve of B, which clarifies that its performance is better. This is obviously seen from the sorting results of 10 examples of A and B, and the number of positive sorts in A is high. More than B. In addition, if the two curves are interspersed, it is necessary to calculate the area enclosed by the curve (AUC) to evaluate the performance.
3. Bias and variance
Generalization errors can be synthesized as the sum of deviation, variance and noise
The bias measures the level of deviation between the hope prediction and the true result of the learning algorithm.
The variance measures the change in learning performance caused by changes in the same size exercise set, ie, the effect of data perturbations.
Noise can be thought of as the volatility of the data itself, expressing the lower limit of the generalization error that any learning algorithm can reach.
The bias is large to clarify the under-fitting, and the large variance clarifies the fit.