News classification
Contact us
- Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
- Tel: 13146317170 廖经理
- Fax:
- Email: 398017534@qq.com
AI Machine Learning Optimization Algorithm Learning
AI Machine Learning Optimization Algorithm Learning
With respect to machine learning, optimization algorithms refer to methods for solving optimization problems. This is a crucial question about disposal. What methods are used when and how they are deduced are now documented as follows to prevent future oblivion. Special needs control its essence.
The optimization problem can be roughly divided into two categories. One is an optimization problem with or without constraints, and the other is optimization with constraints. In the optimization with constraints, we classify the constraints into equality constraints and inequality constraints. See you in order.
Regarding the unconstrained problem, we mainly use several algorithms to deal with it, gradient descent method, iterative scaling method, Newton method, quasi-Newton method, least squares method and so on.
Gradient descent, also known as steepest descent, goes in the direction of the gradient, which is the fastest falling center. It is an iterative algorithm. Each step chooses the direction of the fastest landing, only direction information. First, give an initial value, then find the gradient, then update the initial value until it converges. Gradient refers to the direction of the negative gradient, which is the direction in which the value of the function falls fastest, and then reduces the objective function to update the value of the variable in the quickest way until it converges. In the gradient descent method, the parameters we need to set are: initial value, step size (learning rate), iteration termination condition (case of small number of iterations or gradient change).
When solving the problem is the time division of the convex function, there is a global optimal solution; otherwise, it cannot guarantee that it must be a global optimum, and it is possible to obtain the extreme value instead of the maximum value. (The gradient descent method is one of the gradient methods. The gradient descent is mainly used to minimize the situation, while the gradient descent is mainly used to maximize the situation).
Gradient descent method, which can be used stochastic gradient descent (SGD), that is, random selection of data for gradient update suspension; batch gradient descent (BGD) can be used, that is, all exercise is selected. The data is used to find the average gradient to stop the update; you can use the mini-batch gradient descent method to select a part of the exercise data to find the average gradient to stop updating. Among the three methods, the measure is suspended from three aspects. From the convergence rate, BGD>MBGD>SGD; from the accuracy rate, BGD>MBGD>SGD; from the exercise rate: SGD>MBGD>BGD .
Iterative scaling (IIS), mainly through the search for a parameter correction method can complete the goal from time to time iteration. One of the main applications of this method is in the maximum entropy model. This algorithm requires the upper bound of the difference between the parameter change and the parameter change, the upper bound in the application, the one-dimensional modification, and the fixed remaining dimension to complete the derivation. 0, eventually get this dimension change.
[Newton's method] Newton's method is named after Newton's method of solving equations. Of course, the reason for the name is still applied to Newton's method of solving equations. The main part of the solution is that the first derivative at the extreme point of the solution is 0, which is the equation we have obtained as the first derivative. Therefore, we need to use the second order case. We expand on the second-order Taylor expansion, deriving x-xk, obtaining the first-order relationship, and finally obtaining the first-order and second-order relations, and on the first-order zero of the extremum point, the relationship is finally updated. , Iterate from time to time, the final solution can be completed. For the abbreviated expansion, if the function is one-dimensional, then it is related to the second-order derivative, but if the hypothesis is a high-dimensional function, it cannot be considered as a second-order situation alone, but it touches on the problem of the Hessian matrix, not only touching When it came to the Heisson Matrix, it also touched on the inverse of the Hessian matrix. The problem of finding the inverse of the matrix has also become the bottleneck of the Newton method.
[Quasi-Newton Method] The name of the Quasi-Newton method can be seen. The first is the simulation, that is, it is not true, and the second is the Newton method. Then, the description of the quasi-Newton method is closely related to the Newton method. The existence of the quasi-Newton method is to solve the inverse problem of the Hessian matrix of the Newton method. To find such a matrix instead of the Hessian matrix, the matrix requirement is the same as the Hessian matrix (inverse), and its properties are the same. The same, all positive definite matrix (corresponding to the minimum point), if it is a negative definite matrix (maximum point) <the positive definite matrix is about the transpose of the vector multiplied by the matrix multiplied by the vector is constantly positive, that is, Subtracting an integer is a tendency to fall from time to time. G is the substitution matrix we find, H is the Hessian matrix, and the inverse of G=H. In the quasi-Newton method, there are several methods, the first is the DFP algorithm (Davidon-Fletcher-Powell) algorithm, which stops the explanation from the perspective of the dependent variable, and the second is the BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm. The description is discontinued from the perspective of the independent variable; the third is the Broyden-type algorithm, which is the relationship between the inverse matrix between the matrix in the BFGS and the matrix in the DFP, and then the matrix of the BFGS is represented by the DFP matrix, and then applied. A linear relationship is expressed to finally obtain the demand matrix.
[least-squares method] Least-squares method is used to find the function fit, or the method of function extremum. The objective of the least-squares method is to minimize the squared error, and then to terminate the derivation of the squared error for each parameter so that the derivative is 0 (the derivative is 0 is likely to be the extreme case) and then all the derivatives are 0 The equations make up the equations and then solve the analytical solution. In addition to using algebraic drawing methods, there are application matrix rendering methods. The application of matrix drawing methods touches on matrix derivation. As for the least squares method, it is a direct solution to the analytical solution, and in the matrix drawing, the problem of solving the inverse matrix is solved. Assuming that the inverse matrix does not exist, the solution cannot be suspended, and other methods are still available. As for the fitting of a function, it is necessary that the function for fitting is linear. This is a necessary condition. It is necessary to first convert the nonlinear function into a linear function, and then use the least squares method to stop the solution. And the least squares method is to solve the equations, then the request for the equations can certainly be solved, that is, the request for the equations is not underdetermined, is underdetermined, it can not be solved. The
The optimization problem can be roughly divided into two categories. One is an optimization problem with or without constraints, and the other is optimization with constraints. In the optimization with constraints, we classify the constraints into equality constraints and inequality constraints. See you in order.
Regarding the unconstrained problem, we mainly use several algorithms to deal with it, gradient descent method, iterative scaling method, Newton method, quasi-Newton method, least squares method and so on.
Gradient descent, also known as steepest descent, goes in the direction of the gradient, which is the fastest falling center. It is an iterative algorithm. Each step chooses the direction of the fastest landing, only direction information. First, give an initial value, then find the gradient, then update the initial value until it converges. Gradient refers to the direction of the negative gradient, which is the direction in which the value of the function falls fastest, and then reduces the objective function to update the value of the variable in the quickest way until it converges. In the gradient descent method, the parameters we need to set are: initial value, step size (learning rate), iteration termination condition (case of small number of iterations or gradient change).
When solving the problem is the time division of the convex function, there is a global optimal solution; otherwise, it cannot guarantee that it must be a global optimum, and it is possible to obtain the extreme value instead of the maximum value. (The gradient descent method is one of the gradient methods. The gradient descent is mainly used to minimize the situation, while the gradient descent is mainly used to maximize the situation).
Gradient descent method, which can be used stochastic gradient descent (SGD), that is, random selection of data for gradient update suspension; batch gradient descent (BGD) can be used, that is, all exercise is selected. The data is used to find the average gradient to stop the update; you can use the mini-batch gradient descent method to select a part of the exercise data to find the average gradient to stop updating. Among the three methods, the measure is suspended from three aspects. From the convergence rate, BGD>MBGD>SGD; from the accuracy rate, BGD>MBGD>SGD; from the exercise rate: SGD>MBGD>BGD .
Iterative scaling (IIS), mainly through the search for a parameter correction method can complete the goal from time to time iteration. One of the main applications of this method is in the maximum entropy model. This algorithm requires the upper bound of the difference between the parameter change and the parameter change, the upper bound in the application, the one-dimensional modification, and the fixed remaining dimension to complete the derivation. 0, eventually get this dimension change.
[Newton's method] Newton's method is named after Newton's method of solving equations. Of course, the reason for the name is still applied to Newton's method of solving equations. The main part of the solution is that the first derivative at the extreme point of the solution is 0, which is the equation we have obtained as the first derivative. Therefore, we need to use the second order case. We expand on the second-order Taylor expansion, deriving x-xk, obtaining the first-order relationship, and finally obtaining the first-order and second-order relations, and on the first-order zero of the extremum point, the relationship is finally updated. , Iterate from time to time, the final solution can be completed. For the abbreviated expansion, if the function is one-dimensional, then it is related to the second-order derivative, but if the hypothesis is a high-dimensional function, it cannot be considered as a second-order situation alone, but it touches on the problem of the Hessian matrix, not only touching When it came to the Heisson Matrix, it also touched on the inverse of the Hessian matrix. The problem of finding the inverse of the matrix has also become the bottleneck of the Newton method.
[Quasi-Newton Method] The name of the Quasi-Newton method can be seen. The first is the simulation, that is, it is not true, and the second is the Newton method. Then, the description of the quasi-Newton method is closely related to the Newton method. The existence of the quasi-Newton method is to solve the inverse problem of the Hessian matrix of the Newton method. To find such a matrix instead of the Hessian matrix, the matrix requirement is the same as the Hessian matrix (inverse), and its properties are the same. The same, all positive definite matrix (corresponding to the minimum point), if it is a negative definite matrix (maximum point) <the positive definite matrix is about the transpose of the vector multiplied by the matrix multiplied by the vector is constantly positive, that is, Subtracting an integer is a tendency to fall from time to time. G is the substitution matrix we find, H is the Hessian matrix, and the inverse of G=H. In the quasi-Newton method, there are several methods, the first is the DFP algorithm (Davidon-Fletcher-Powell) algorithm, which stops the explanation from the perspective of the dependent variable, and the second is the BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm. The description is discontinued from the perspective of the independent variable; the third is the Broyden-type algorithm, which is the relationship between the inverse matrix between the matrix in the BFGS and the matrix in the DFP, and then the matrix of the BFGS is represented by the DFP matrix, and then applied. A linear relationship is expressed to finally obtain the demand matrix.
[least-squares method] Least-squares method is used to find the function fit, or the method of function extremum. The objective of the least-squares method is to minimize the squared error, and then to terminate the derivation of the squared error for each parameter so that the derivative is 0 (the derivative is 0 is likely to be the extreme case) and then all the derivatives are 0 The equations make up the equations and then solve the analytical solution. In addition to using algebraic drawing methods, there are application matrix rendering methods. The application of matrix drawing methods touches on matrix derivation. As for the least squares method, it is a direct solution to the analytical solution, and in the matrix drawing, the problem of solving the inverse matrix is solved. Assuming that the inverse matrix does not exist, the solution cannot be suspended, and other methods are still available. As for the fitting of a function, it is necessary that the function for fitting is linear. This is a necessary condition. It is necessary to first convert the nonlinear function into a linear function, and then use the least squares method to stop the solution. And the least squares method is to solve the equations, then the request for the equations can certainly be solved, that is, the request for the equations is not underdetermined, is underdetermined, it can not be solved. The