Machine learning algorithm summary

News classification

Contact us

Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
Tel: 13146317170 廖经理
Fax:
Email: 398017534@qq.com

Machine learning algorithm summary
1. Normalization of eigenvalues.

Objective: to make all characteristic values have the same dimension.
Reason: algorithm used in the process of calculating the value of the feature.
Take LR for example, gradient descent algorithm is used. When calculating the gradient, the value of the feature is applied. If the eigenvalue does not stop normalizing, then in the same learning rate, the feature with smaller eigenvalue is not good at learning (because the learning rate is too large for the eigenvalue at this time, not suitable), in order to prevent this situation, the learning rate will be set very small, learning rate is small, this will recruit. The algorithm is slow to learn.
Ps: the tree-based approach does not require stopping the normalization of eigenvalues, because the whole tree model algorithm does not use the value of the feature in the process of computing (solidarity), using the probability that the feature has the value, using this probability to calculate the gain, thus stopping the unity.
2. For binary classification problem, if one class of samples is very small (assuming that there are very few negative samples), and the ratio of positive and negative samples is set to what extent, the algorithm can learn the information of less class samples adequately (that is, learning the information of negative samples adequately?).

Look at the badcases of the algorithm.
If there is a batch of samples, the samples have characteristics and labels, the dividend training set and the test set, the proportion of the two confluent positive and negative samples is different, and are less negative samples. Then there is a machine learning algorithm, which is now used to learn the samples of the train set and score the samples of the test set. Because the labels of the test set samples are known, we can know which test set samples are correctly scored and which are incorrectly scored.
If the test set, most of the negative samples are identified as positive samples, clarify the algorithm about negative samples learning is not sufficient, processing methods: the need to increase the number of negative samples; develop new features;
If the test set, most of the negative samples are discriminated into negative samples, correct discrimination, clarify the algorithm about negative sample learning is abundant.
3, two classification problem, when can we clarify an algorithm and learn the positive and negative samples?

Look at the AUC of test set and train set, the difference is within 0.5%~1%. Explain the generalization of the model trained by this algorithm.
Look at badcases, same question 2
4. For the binary classification problem, when the algorithm learns the positive and negative samples sufficiently, adding the samples, no matter what kind of samples, changes the bias of the model. At this time, the variance of the model will not be changed, because the model has learned very well.

5. Model overfitting.

When we say that a model is over-fitting, we mean that the variance of a model is large; and when a model bias is large, but variance is small, we think that the model is comparatively stable, model learning is very good, that is, there will be some offsets, which we can eliminate through translation and other operations, so this We don't think it's over fitting.
6, model evaluation index, why not use accuracy? What kind of evaluation indicators are used under different business scenarios?

Balance the quality of a model without accuracy, accuracy is not accurate, because of different thresholds, there will be different accuracy.
Different business scenarios have different evaluation indexes for the model. Take the two classification problem as an example.
* scenario 1: I care very much about the ability of model algorithm to distinguish and sort two kinds of samples. * for example, the general classification problem, the credit scoring model belongs to this category. At this time, the model evaluation index is AUC (area under curve): the area of ROC curve and X axis. In terms of accuracy, AUC weighs the overall performance of the model at all thresholds, not at a single threshold.
ROC curve
The longitudinal axis TPR = TP / [TP + FN], the probability of positive pairs in all positive cases.
The horizontal axis FPR = FP / [FP + TN], the probability of negative cases in all negative cases.
AUC is the area under the ROC curve, the trade-off is the classification / sorting ability of the model about different classes of samples, usually the higher the tpr, the lower the fpr, the better the classifier effect, so the more convex the ROC curve to the left, the better. AUC depicts a trade-off between "positive case pairing" and "negative case mismatching", both of which are considered and do not wish to be too suitable for a single class.
** Scenario 2: When the sample is very unbalanced, it's important to retrieve as many samples as possible of the type that is needed (the one with a small number of samples) and to ensure that the false positive rate of the other is comparatively low. * this kind of general belongs to the retrieval problem, and the anti cunning scenario belongs to this category. At this point, the PR curve is used to evaluate the model.
PR curve
Proportion of true proportions in a sample whose longitudinal precision (precision / precision) TP / FP is predicted to be normal
Horizontal recall (recall / flexibility / recall) TP / TP + FN: The proportion of samples predicted to be positive and really positive in all case samples
Precsion and recall are interactional, ideally both are highest and best, but in general they are inversely proportional. The PR curve is a trade_off between the recall and precision of the positive sample (labe=1).
Why not use AUC? Because the sample is very unbalanced, even if the model can not distinguish a few classes very well, there will be a high AUC, because the number of other classes is too large, the other class of samples have many correct classification, which will make the overall AUC very high. For example, in a reputation model (the ratio of good to bad = 10:1), experiencing an AUC that thinks 0.8 clarifies the model's ability to distinguish between good and bad users is good, but in an anti-deception scenario.

PREVIOUS：Android system passport reader NEXT：Identification of Burma identity card